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^ ! Abstract 

\0 I We provide a model to investigate the tension between information aggregation and 

^ ■ spread of misinformation in large societies (conceptualized as networks of agents com- 

municating with each other). Each individual holds a belief represented by a scalar. 
Individuals meet pairwise and exchange information, which is modeled as both individ- 
c/3 [ uals adopting the average of their pre-meeting beliefs. When all individuals engage in 

this type of information exchange, the society will be able to effectively aggregate the 
initial information held by all individuals. There is also the possibility of misinformation, 
^ ■ however, because some of the individuals are "forceful," meaning that they influence the 

^ , beliefs of (some) of the other individuals they meet, but do not change their own opinion. 

^ [ The paper characterizes how the presence of forceful agents interferes with information 

I aggregation. Under the assumption that even forceful agents obtain some information 

\0 • (however infrequent) from some others (and additional weak regularity conditions), we 

first show that beliefs in this class of societies converge to a consensus among all in- 
O ' dividuals. This consensus value is a random variable, however, and we characterize its 

behavior. Our main results quantify the extent of misinformation in the society by either 
I providing bounds or exact results (in some special cases) on how far the consensus value 

^ ■ can be from the benchmark without forceful agents (where there is efficient information 

aggregation). The worst outcomes obtain when there are several forceful agents and 
forceful agents themselves update their beliefs only on the basis of information they 
obtain from individuals most likely to have received their own information previously. 
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1 Introduction 

Individuals form beliefs on various economic, political and social variables ("state") 
based on information they receive from others, including friends, neighbors and cowork- 
ers as well as local leaders, news sources and political actors. A key tradeoff faced by 
any society is whether this process of information exchange will lead to the formation 
of more accurate beliefs or to certain systematic biases and spread of misinformation. 
A famous idea going back to Condorcet's Jury Theorem (now often emphasized in the 
context of ideas related to "wisdom of the crowds" ) encapsulates the idea that exchange 
of dispersed information will enable socially beneficial aggregation of information. How- 
ever, as several examples ranging from the effects of the Swift Boat ads during the 2004 
presidential campaign to the beliefs in the Middle East that 9/11 was a US or Israeli 
conspiracy illustrate, in practice social groups are often swayed by misleading ads, media 
outlets, and political leaders, and hold on to incorrect and inaccurate beliefs. 

A central question for social science is to understand the conditions under which 
exchange of information will lead to the spread of misinformation instead of aggregation 
of dispersed information. In this paper, we take a first step towards developing and 
analyzing a framework for providing answers to this question. While the issue of misin- 
formation can be studied using Bayesian models, non-Bayesian models appear to provide 
a more natural starting point o Our modeling strategy is therefore to use a non-Bayesian 
model, which however is reminiscent of a Bayesian model in the absence of "forceful" 
agents (who are either trying to mislead or influence others or are, for various rational 
or irrational reasons, not interested in updating their opinions). 

We consider a society envisaged as a social network of n agents, communicating and 
exchanging information. Specifically, each agent is interested in learning some under- 
lying state ^ G M and receives a signal Xj(0) G M in the beginning. We assume that 
6 = ^/nY2^=iXi{0), so that information about the relevant state is dispersed and this 
information can be easily aggregated if the agents can communicate in a centralized or 
decentralized fashion. 

Information exchange between agents takes place as follows: Each individual is "rec- 
ognized" according to a Poisson process in continuous time and conditional on this event, 
meets one of the individuals in her social neighborhood according to a pre-specified 
stochastic process. We think of this stochastic process as representing an underlying 
social network (for example, friendships, information networks, etc.). Following this 
meeting, there is a potential exchange of information between the two individuals, af- 
fecting the beliefs of one or both agents. We distinguish between two types of individuals: 
regular or forceful. When two regular agents meet, they update their beliefs to be equal 
to the average of their pre-meeting beliefs. This structure, tough non-Bayesian, has a 

^In particular, misinformation can arise in a Bayesian model if an agent (receiver) is unsure of the 
type of another agent (sender) providing her with information and the sender happens to be of a type 
intending to mislead the receiver. Nevertheless, this type of misinformation will be limited since if the 
probability that the sender is of the misleading type is high, the receiver will not change her beliefs 
much on the basis of the sender's communication. 
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simple and appealing interpretation and ensures the convergence of beliefs to the un- 
derlying state 9 when the society consists only of regular agentsJl In contrast, when 
an agent meets a forceful agent, this may result in the forceful agent "influencing" his 
beliefs so that this individual inherits the forceful agent's belief except for an e weight 
on his pre-meeting beliefl Our modeling of forceful agents is sufficiently general to nest 
both individuals (or media outlets) that purposefully wish to influence others with their 
opinion or individuals who, for various reasons, may have more influence with some 
subset of the population]^ A key assumption of our analysis is that even forceful agents 
engage in some updating of their beliefs (even if infrequently) as a result of exchange 
of information with their own social neighborhoods. This assumption captures the in- 
tuitive notion that "no man is an island" and thus receives some nontrivial input from 
the social context in which he or she is situated]^ The influence pattern of social agents 
superimposed over the social network can be described by directed links, referred to 
as forceful links, and creates a richer stochastic process, representing the evolution of 
beliefs in the society. Both with and without forceful agents, the evolution of beliefs 
can be represented by a Markov chain and our analysis will exploit this connection. We 
will frequently distinguish the Markov chain representing the evolution of beliefs and 
the Markov chain induced by the underlying social network (i.e., just corresponding to 
the communication structure in the society, without taking into account the influence 
pattern) and properties of both will play a central role in our results. 

Our objective is to characterize the evolution of beliefs and quantify the effect of 
forceful agents on public opinion in the context of this model. Our flrst result is that, 
despite the presence of forceful agents, the opinion of all agents in this social network 
converges to a common, tough stochastic, value under weak regularity conditions. More 
formally, each agent's opinion converges to a value given by 7r'x(0), where x(0) is the 
vector of initial beliefs and vr is a random vector. Our measure of spread of misinforma- 
tion in the society will be 7f'x(0) — 9 = Yl^=ii^i ~ ^/n)xi{0), where vf is the expected 
value of vr and vfj denotes its ith component. The greater is this gap, the greater is 
the potential for misinformation in this society. Moreover, this formula also makes it 
clear that vf j — 1/n gives the excess influence of agent i. Our strategy will be to develop 

^The appealing interpretation is that this type of averaging would be optimal if both agents had 
beliefs drawn from a normal distribution with mean equal to the underlying state and equal precision. 
This interpretation is discussed in detail in De Marzo, Vayanos, and Zwiebel |16| in a related context. 

■^When e ~ 1/2, then the individual treats the forceful agent just as any other regular agent (is 
not influenced by him over and above the information exchange) and the only difference from the 
interaction between two regular agents is that the forceful agent himself does not update his beliefs. All 
of our analysis is conducted for arbitrary e, so whether forceful agents are also "influential" in pairwise 
meetings is not important for any of our findings. 

^What we do not allow are individuals who know the underlying state and try to convince others of 
some systematic bias relative to the underlying state, though the model could be modified to fit this 
possibility as well. 

^When there are several forceful agents and none of them ever change their opinion, then it is 
straightforward to see that opinions in this society will never settle into a "stationary" distribution. 
While this case is also interesting to study, it is significantly more difficult to analyze and requires a 
different mathematical approach. 
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bounds on the spread of misinformation in the society (as defined above) and on the 
excess influence of each agent for general social networks and also provide exact results 
for some special networks. 

We provide three types of results. First, using tools from matrix perturbation the- 
ory|§ we provide global and general upper bounds on the extent of misinformation as a 
function of the properties of the underlying social network. In particular, the bounds 
relate to the spectral gap and the mixing properties of the Markov chain induced by the 
social network. Recall that a Markov chain is fast-mixing if it converges rapidly to its 
stationary distribution. It will do so when it has a large spectral gap, or loosely speak- 
ing, when it is highly connected and possesses many potential paths of communication 
between any pair of agents. Intuitively, societies represented by fast-mixing Markov 
chains have more limited room for misinformation because forceful agents themselves 
are influenced by the weighted opinion of the rest of the society before they can spread 
their own (potentially extreme) views. A corollary of these results is that for a spe- 
cial class of societies, corresponding to "expander graphs", misinformation disappears 
in large societies provided that there is a finite number of forceful agents and no forceful 
agent has global impact Q In contrast, the extent of misinformation can be substantial 
in slow-mixing Markov chains, also for an intuitive reason. Societies represented by 
such Markov chains would have a high degree of partitioning (multiple clusters with 
weak communication in between), so that forceful agents receive their information from 
others who previously were influenced by them, ensuring that their potentially extreme 
opinions are never moderatedl^ 

Our second set of results exploit the local structure of the social network in the 
neighborhood of each forceful agent in order to provide a tighter characterization of the 
extent of misinformation and excess influence. Fast-mixing and spectral gap properties 
are global (and refer to the properties of the overall social network representing meeting 
and communication patterns among all agents). As such, they may reflect properties of 
a social network far from where the forceful agents are located. If so, our first set of 
bounds will not be tight. To redress this problem, we develop an alternative analysis 
using mean (first) passage times of the Markov chain and show how it is not only the 
global properties of the social network, but also the local social context in which forceful 
agents are situated that matter. For example, in a social network with a single dense 
cluster and several non-clustered pockets, it matters greatly whether forceful links are 
located inside the cluster or not. We illustrate this result sharply by first focusing 

^In particular, we decompose the transition matrix of the Markov chain into a doubly stochastic 
matrix, representing the underlying social network, and a remainder matrix, representing a directed 
influence graph. Despite the term "perturbation." this remainder matrix need not be "small" in any 
sense. 

^Expander graphs are graphs whose spectral gap remains bounded away from zero as the number 
of nodes tends to infinity. Several networks related to the Internet correspond to expander graphs; see, 
for example, Mihail, Papadimitriou, and Saberi [27] . 

^This result is related to Golub and Jackson [20], where they relate learning to homophily properties 
of the social network. 
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on graphs with forceful essential edges, that is, graphs representing societies in which 
a single forceful link connects two otherwise disconnected components. This, loosely 
speaking, represents a situation in which a forceful agent, for example a media outlet or 
a political party leader, obtains all of its (or his or her) information from a small group 
of individuals and influences the rest of the society. In this context, we establish the 
surprising result that all members of the small group will have the same excess influence, 
even though some of them may have much weaker links or no links to the forceful agent. 
This result is an implication of the society having a (single) forceful essential edge and 
reflects the fact that the information among the small group of individuals who are the 
source of information of the forceful agent aggregates rapidly and thus it is the average 
of their beliefs that matter. We then generalize these results and intuitions to more 
general graphs using the notion of information bottlenecks. 

Our third set of results are more technical in nature, and provide new conceptual 
tools and algorithms for characterizing the role of information bottlenecks. In particular, 
we introduce the concept of relative cuts and present several new results related to 
relative cuts and how these relate to mean first passage times. For our purposes, these 
new results are useful because they enable us to quantify the extent of local clustering 
around forceful agents. Using the notion of relative cuts, we develop new algorithms 
based on graph clustering that enable us to provide improved bounds on the extent of 
misinformation in beliefs as a function of information bottlenecks in the social network. 

Our paper is related to a large and growing learning literature. Much of this literature 
focuses on various Bayesian models of observational or communication-based learning; 
for example Bikchandani, Hirshleifer and Welch [8], Banerjee [6], Smith and Sorensen 
[36] . [35] . Banerjee and Fudenberg [7], Bala and Goyal [1], [5], Gale and Kariv [18], and 
Celen and Kariv [12], [H]. These papers develop models of social learning either using 
a Bayesian perspective or exploiting some plausible rule-of-thumb behavior. Acemoglu, 
Dahleh, Lobel and Ozdaglar [1] provide an analysis of Bayesian learning over general 
social networks. Our paper is most closely related to DeGroot [15], DeMarzo, Vayanos 
and Zwiebel [16] and Golub and Jackson [21], [20], who also consider non-Bayesian 
learning over a social network represented by a connected graphj^ None of the papers 
mentioned above consider the issue of the spread of misinformation (or the tension 
between aggregation of information and spread of misinformation), though there are 
close parallels between Golub and Jackson's and our characterizations of influence In 

^An important distinction is that in contrast to the "averaging" model used in these papers, we 
have a model of pairwise interactions. We believe that this model has a more attractive economic 
interpretation, since it does not have the feature that neighbors' information will be averaged at each 
date (even though the same information was exchanged the previous period). In contrast, in the pairwise 
meeting model (without forceful agents), if a pair meets two periods in a row, in the second meeting 
there is no information to exchange and no change in beliefs takes place. 

^°In particular, Golub and Jackson [20] characterize the effects of homophily on learning and influence 
in two different models of learning in terms of mixing properties and the spectral gap of graphs. In one 
of their learning models, which builds on DeGroot [15], DeMarzo, Vayanos and Zwiebel [l^ and Golub 
and Jackson |21j , homophily has negative effects on learning (and speed of learning) for reasons related 
to our finding that in slow-mixing graphs, misinformation can spread more. 
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addition to our focus, the methods of analysis here, which develop bounds on the extent 
of misinformation and provide exact characterization of excess influence in certain classes 
of social networks, are entirely new in the literature and also rely on the developments 
of new results in the analysis of Markov chains. 

Our work is also related to other work in the economics of communication, in par- 
ticular, to cheap-talk models based on Crawford and Sobel [T^ (see also Farrell and 
Gibbons [17] and Sobel [37]), and some recent learning papers incorporating cheap-talk 
games into a network structure (see Ambrus and Takahashi [3|, Hagenbach and Koessler 
[22], and Galeotti, Ghiglino and Squintani [H]). 

In addition to the papers on learning mentioned above, our paper is related to work 
on consensus, which is motivated by different problems, but typically leads to a similar 
mathematical formulation (Tsitsiklis [38], Tsitsiklis, Bertsekas and Athans [39], Jad- 
babaie, Lin and Morse [25], Olfati-Saber and Murray [29], Olshevsky and Tsitsiklis |30j . 
Nedic and Ozdaglar [21] )• In consensus problems, the focus is on whether the beliefs 
or the values held by different units (which might correspond to individuals, sensors or 
distributed processors) converge to a common value. Our analysis here does not only 
focus on consensus, but also whether the consensus happens around the true value of 
the underlying state. There are also no parallels in this literature to our bounds on 
misinformation and characterization results. 

The rest of this paper is organized as follows: In Section [21 we introduce our model of 
interaction between the agents and describe the resulting evolution of individual beliefs. 
We also state our assumptions on connectivity and information exchange between the 
agents. Section [3] presents our main convergence result on the evolution of agent beliefs 
over time. In Section HI we provide bounds on the extent of misinformation as a function 
of the global network parameters. Section [S] focuses on the effects of location of forceful 
links on the spread of misinformation and provides bounds as a function of the local 
connectivity and location of forceful agents in the network. Section [6] contains our 
concluding remarks. 

Notation and Terminology: A vector is viewed as a column vector, unless clearly 
stated otherwise. We denote by Xi or [x]i the i^'^ component of a vector x. When Xj > 
for all components z of a vector x, we write x > 0. For a matrix A, we write Aij or [A]ij 
to denote the matrix entry in the z*^' row and j*^ column. We write x' to denote the 
transpose of a vector x. The scalar product of two vectors x,y E is denoted by x'y. 
We use ||x||2 to denote the standard Euchdean norm, ||a;||2 = y/x'x. We write ||x||oo to 
denote the max norm, ||x||oo = niaxi<j<m \xi\. We use to denote the vector with i^^ 
entry equal to 1 and all other entries equal to 0. We denote by e the vector with all 
entries equal to 1. 

A vector a is said to be a stochastic vector when > for all i and Yli^i — 1- 
A square matrix A is said to be a (row) stochastic matrix when each row of A is a 
stochastic vector. The transpose of a matrix A is denoted by A'. A square matrix A is 
said to be a doubly stochastic matrix when both A and A' are stochastic matrices. 
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2 Belief Evolution 

2.1 Description of the Environment 

We consider a set A/" = {1, . . . ,n} of agents interacting over a social network. Each 
agent i starts with an initial belief about an underlying state, which we denote by 
Xi{0) G M. Agents exchange information with their neighbors and update their beliefs. 
We assume that there are two types of agents; regular and forceful. Regular agents 
exchange information with their neighbors (when they meet). In contrast, forceful agents 
influence others disproportionately. 

We use an asynchronous continuous-time model to represent meetings between agents 
(also studied in Boyd et al. [1] in the context of communication networks). In particular, 
we assume that each agent meets (communicates with) other agents at instances defined 
by a rate one Poisson process independent of other agents. This implies that the meeting 
instances (over all agents) occur according to a rate n Poisson process at times tk, k > 1. 
Note that in this model, by convention, at most one node is active (i.e., is meeting 
another) at a given time. We discretize time according to meeting instances (since these 
are the relevant instances at which the beliefs change), and refer to the interval [tk, tfc+i) 
as the k^^ time slot. On average, there are n meeting instances per unit of absolute time 
(see Boyd et al. [9] for a precise relation between these instances and absolute time). 
Suppose that at time (slot) k, agent i is chosen to meet another agent (probability 
1/n). In this gent i will meet agent j E M with probability Pij. Following a 

meeting between i and j, there is a potential exchange of information. Throughout, 
we assume that all events that happen in a meeting are independent of any other event 
that happened in the past. Let Xi{k) denote the belief of agent i about the underlying 
state at time k. The agents update their beliefs according to one of the following three 
possibilities. 

(i) Agents i and j reach pairwise consensus and the beliefs are updated according to 



We denote the conditional probability of this event (conditional on i meeting j) 



(ii) Agent j influences agent i, in which case for some e G (0,1/2], beliefs change 
according to 



ability of this event as ajj, and refer to it as the influence probability. Note that 

^^We could allow the self belief weight e to be different for each agent i. This generality does not 
change the results or the economic intuitions, so for notational convenience, we assume this weight to 
be the same across all agents. 
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we allow e = 1/2, so that agent i may be treating agent j just as a regular agent, 
except that agent j himself does not change his beliefs. 

(iii) Agents i and j do not agree and stick to their beliefs, i.e.. 



This event has probability = 1 — (3ij — aij. 

Any agent j for whom the influence probability aij > for some i G A/" is referred 
to as a forceful agent. Moreover, the directed link (j, i) is referred to as a forceful link^ 

As discussed in the introduction, we can interpret forceful agents in multiple different 
ways. First, forceful agents may correspond to community leaders or news media, will 
have a disproportionate effect on the beliefs of their followers. In such cases, it is natural 
to consider e small and the leaders or media not updating their own beliefs as a result 
of others listening to their opinion. Second, forceful agents may be indistinguishable 
from regular agents, and thus regular agents engage in what they think is information 
exchange, but forceful agents, because of stubbornness or some other motive, do not 
incorporate the information of these agents in their own beliefs. In this case, it may be 
natural to think of e as equal to 1/2. The results that follow remain valid with either 
interpretation. 

The influence structure described above will determine the evolution of beliefs in 
the society. Below, we will give a more precise separation of this evolution into two 
components, one related to the underlying social network (communication and meeting 
structure), and the other to influence patterns. 

2.2 Assumptions 

We next state our assumptions on the belief evolution model among the agents. We 
have the following assumption on the agent meeting probabilities pij. 

Assumption 1. (Meeting Probabilities) 

(a) For all i, the probabilities pu are equal to 0. 

(b) For all i, the probabilities Pij are nonnegative for all j and they sum to 1 over j, 



Xi{k + 1) = Xi{k), and Xj{k + 1) = Xj{k). 



I.e., 



n 





for all i. 



12 



We refer to directed links/edges as links and undirected ones as edges. 
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Assumption [D^a) imposes that "self-communication" is not a possibility, though this 
is just a convention, since, as stated above, we allow disagreement among agents, i.e., 
7jj can be positive. We let P denote the matrix with entries pij. Under Assumption 
[Dj^b), the matrix P is a stochastic matrix^ 

We next impose a connectivity assumption on the social network. This assumption is 
stated in terms of the directed graph (A/", S), where S is the set of directed links induced 
by the positive meeting probabilities pij, i.e.. 



Assumption 2. (Connectivity) The graph {M^£) is strongly connected, i.e., for all 
i,j G A/", there exists a directed path connecting i to j with links in the set £. 

Assumption [2] ensures that every agent "communicates" with every other agent (pos- 
sibly through multiple links). This is not an innocuous assumption, since otherwise the 
graph (A/", £) (and the society that it represents) would segment into multiple non- 
communicating parts. Though not innocuous, this assumption is also natural for several 
reasons. First, the evidence suggests that most subsets of the society are not only con- 
nected, but are connected by means of several hnks (e.g.. Watts [IQ] and Jackson [M]), 
and the same seems to be true for indirect linkages via the Internet. Second, if the soci- 
ety is segmented into multiple non-communication parts, the insights here would apply, 
with some modifications, to each of these parts. 

Let us also use dij to denote the length of the shortest path from i to j and d to 
denote the maximum shortest path length between any j G A/", i.e.. 



In view of Assumption [21 these are all well-defined objects. 

Finally, we introduce the following assumption which ensures that there is positive 
probability that every agent (even if he is forceful) receives some information from an 
agent in his neighborhood. 

Assumption 3. (Interaction Probabilities) For all {i,j) G S, the sum of the averaging 
probability Pij and the influence probability aij is positive, i.e.. 



The connectivity assumption (Assumption [2]) ensures that there is a path from any 
forceful agent to other agents in the network, implying that for any forceful agent i, there 
is a link {i,j) G S for some j G A/". Then the main role of Assumption [3] is to guarantee 
that even the forceful agents at some point get information from the other agents in 

^■^That is, its row sums are equal to 1. 




(2) 



d = max dij. 



(3) 



(3ij + aij > 



for all (i,j) G S. 



LIDS Report 2812 



10 



the network^ This assumption captures the idea that "no man is an island," i.e., even 
the beliefs of forceful agents are affected by the beliefs of the society. In the absence of 
this assumption, any society consisting of several forceful agents may never settle into 
a stationary distribution of beliefs. While this is an interesting situation to investigate, 
it requires a very different approach. Since we view the "no man is an island" feature 
plausible, we find Assumption [3] a useful starting point. 

Throughout the rest of the paper, we assume that Assumptions [H [21 and [3] hold. 

2.3 Evolution of Beliefs: Social Network and Influence Matri- 
ces 

We can express the preceding belief update model compactly as follows. Let x{k) = 
{xi{k), . . . ,Xn{k)) denote the vector of agent beliefs at time k. The agent beliefs are 
updated according to the relation 

x{k + l) = W{k)x{k), (4) 

where W{k) is a random matrix given by 

{Aij = I - with probability pij/Sij/n, 

Jij = / — (! — e)ei(ej — e-,)' with probability pijaij/n, (5) 
/ with probability pij%j/n, 

for all i,j G A/". The preceding belief update model implies that the matrix W{k) is a 
stochastic matrix for all k, and is independent and identically distributed over all k. 
Let us introduce the matrices 

^k,s) = W{k)W{k-l)---W{s + l)W{s) for all A; and s with A; > s, (6) 

with ^{k, k) = W{k) for all k. We will refer to the matrices s) as the transition 
matrices. We can now write the belief update rule (jlj) as follows: for all s and k with 
k > s > and all agents i G {1, . . . ,n}, 

n 

Xi{k + 1) = Y,Mk,s)]^JXJ{s). (7) 
i=i 

Given our assumptions, the random matrix W{k) is identically distributed over all 
k, and thus we have for some nonnegative matrix W, 

E[W{k)] = W for all A; > 0. (8) 

^^This assumption is stated for all G £, thus a forceful agent i receives some information from 
any j in his "neighborhood" . This is without any loss of generality, since we can always set ptj = for 
those j's that are in i's neighborhood but from whom i never obtains information. 
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The matrix, W, which we refer to as the mean interaction matrix, represents the evolu- 
tion of behefs in the society. It incorporates elements from both the underlying social 
network (which determines the meeting patterns) and the influence structure. In what 
follows, it will be useful to separate these into two components, both for our mathemat- 
ical analysis and to clarify the intuitions. For this purpose, let us use the belief update 
model (HI)-© and write the mean interaction matrix W as followsQ 



W 



n 



where Aij and Jij are matrices defined in Eq. ([5]), and the second inequality follows from 
the fact that Pij = 1 — aij — %j for all i, j G M. We use the notation 

to write the mean interaction matrix, W, as 

W = T + D. (10) 

Here, the matrix T only depends on meeting probabilities (matrix P) except that 
it also incorporates '-fij (probability that following a meeting no exchange takes place). 
We can therefore think of the matrix T as representing the underlying social network 
(friendships, communication among coworkers, decisions about which news outlets to 
watch, etc.), and refer to it as the social network matrix. It will be useful below to 
represent the social interactions using an undirected (and weighted) graph induced by 
the social network matrix T. This graph is given by (A/", ^), where A is the set of 
undirected edges given by 

A=\{t,j}\T,,>o}, (11) 



and the weight Wg of edge e = {i,j} is given by the entry Tij = Tji of the matrix T. We 
refer to this graph as the social network graph. 

The matrix D, on the other hand, can be thought of as representing the influence 
structure in the society. It incorporates information about which individuals and links 
are forceful (i.e., which types of interactions will lead to one individual influencing the 
other without updating his own beliefs). We refer to matrix D as the influence matrix. 
It is also useful to note for interpreting the mathematical results below that T is a 
doubly stochastic matrix, while D is not. Therefore, Eq. (ITOl) gives a decomposition of 
the mean connectivity matrix W into a doubly stochastic and a remainder component, 
and enables us to use tools from matrix perturbation theory (see Section H]). 



In the sequel, the notation J2i wiU be used to denote the double sum J2i=i S?=i- 
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3 Convergence 

In this section, we provide our main convergence result. In particular, we show that 
despite the presence of forceful agents, with potentially very different opinions at the 
beginning, the society will ultimately converge to a consensus, in which all individuals 
share the same belief. This consensus value of beliefs itself is a random variable. We 
also provide a first characterization of the expected value of this consensus belief in 
terms of the mean interaction matrix (and thus social network and influence matrices). 
Our analysis essentially relies on showing that iterates of Eq. (jlj), x{k), converge to a 
consensus with probability one, i.e., x{k) — *• xe, where a: is a scalar random variable 
that depends on the initial beliefs and the random sequence of matrices and e 

is the vector of all one's. The proof uses two lemmas which are presented in Appendix 
B. 

Theorem 1. The sequences {xi{k)}, i G A/", generated by Eq. (jlj) converge to a consen- 
sus belief, i.e., there exists a scalar random variable x such that 

lim Xi{k) = X for all i with probability one. 

A;— »oo 

Moreover, the random variable x is a convex combination of initial agent beliefs, i.e., 

n 

X = ^7ljXj{0), 

where tt = [tti, . . . , vr^] is a random vector that satisfies nj > for all j, and J2]=i = 1- 



Proof. By Lemma [9] from Appendix B, we have 

P !^ms + n^d-l,s)]ij>^e'''-\ foralH,j|> (^^^ for all s > 0, 

where <l>(s + n'^d — 1, s) is a transition matrix [cf. Eq. ([SD], d is the maximum shortest 
path length in graph {Af,S) [cf. Eq. ([3])], e is the self belief weight against a forceful 
agent [cf. Eq. ([1])], and 77 is a positive scalar defined in Eq. psl) . This relation implies 
that over a window of length n'^d, all entries of the transition matrix $(s + n'^d — l,s) 
are strictly positive with positive probability, which is uniformly bounded away from 0. 
Thus, we can use Lemma M (from Appendix A) with the identifications 



Letting 



H{k) = W{k), B = n^d, 9 = ^e"'-^ 



Mik) = maxxj(/c), m(k) = min Xi(k), 
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d 2 

this implies that n-^e" ~^ < 1 and for all s > 0, 



P ^^M{s + n'^d) - m{s + n'^d) < (1 - nr/72 e"'-^)(M(s) - m(s))} > 

Moreover, by the stochasticity of the matrix W{k), it follows that the sequence {M{k) — 
m{k)} is nonincreasing with probability one. Hence, we have for all s > 



E 



M{s+n^d)-m{s+n^d) 



< 



(M(s)-m(s)), 



from which, for any A; > 0, we obtain 



E 



M{k) - m{k) 



< 



l-^l 



(M(0) -m(0)). 



This implies that 



lim M{k) — m{k) = with probability one. 



k—foo 



The stochasticity of the matrix W{k) further implies that the sequences {M{k)} and 
{m{k)} are bounded and monotone and therefore converges to the same limit, which we 
denote by x. Since we have 



m{k) < Xi{k) < M{k) 



for all i and A; > 0, 



it follows that 



lim Xi{k) = X for all i with probability one, 

fc— »oo 



establishing the first result. 

Letting s = in Eq. ([7]), we have for all i 



(12) 



.(A;) = - l,0)]yXj{0) for all k>0. 

From the previous part, for any initial belief vector a:(0), the limit 

n 

lim Xi{k) = V lim [<l>{k - 1, 0)]ij Xj{0) 

fc— ►oo ' ^ k—>oo 

exists and is independent of i. Hence, for any h, we can choose x{0) = et, i.e., Xh{0) = 1 
and Xj{0) = for all j ^ h, implying that the limit 

lim [$(^-1,0)],;, 

exists and is independent of i. Denoting this limit by iih and using Eq. f[T21) . we obtain 
the desired result, where the properties of the vector tt = [tti , . . . , 7r„] follows from 
the stochasticity of matrix $(/c, 0) for all k (implying the stochasticity of its limit as 
k^oo). □ 
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The key implication of this result is that, despite the presence of forceful agents, 
the society will ultimately reach a consensus. Though surprising at first, this result is 
intuitive in light of our "no man is an island" assumption (Assumption [3]). However, in 
contrast to "averaging models" used both in the engineering literature and recently in 
the learning literature, the consensus value here is a random variable and will depend 
on the order in which meetings have taken place. The main role of this result for us 
is that we can now conduct our analysis on quantifying the extent of the spread of 
misinformation by looking at this consensus value of beliefs. 

The next theorem characterizes E[x] in terms of the limiting behavior of the matrices 
W'' as k goes to infinity. 

Theorem 2. Let x be the limiting random variable of the sequences {xi{k)}, i & M 
generated by Eq. (j4]) (cf. Theorem [1]). Then we have: 

(a) The matrix converges to a stochastic matrix with identical rows vf as /c goes 
to infinity, i.e., 

lim = evr'. 

fc— >oo 

(b) The expected value of x is given by a convex combination of the initial agent values 
Xj(0), where the weights are given by the components of the probability vector vf, 
i.e., 

n 

E[x] = ^7r,x,(0) = 7r'x(0). 

i=l 

Proof, (a) This part relies on the properties of the mean interaction matrix established 
in Appendix B. In particular, by Lemma [7](a), the mean interaction matrix 1^ is a 
primitive matrix. Therefore, the Markov Chain with transition probability matrix W is 
regular (see Section W7l\ for a definition). The result follows immediately from Theorem 
Ha). 

(b) From Eq. ([7]), we have for all > 

x{k) = $(A; - l,0)x(0). 
Moreover, since x{k) k — s> oo, we have 

E[xe] = £'[lim x{k)] = lim E[x{k)], 

k—*oo fc— ►oo 

where the second equality follows from the Lebesgue's Dominated Convergence Theorem 
(see [21] )• Combining the preceding two relations and using the assumption that the 
matrices W{k) are independent and identically distributed over all A; > 0, we obtain 

E[xe] = lim E^k - l,0)x(0)] = lim W''x{0), 

fc^oo fc— >oo 

which in view of part (a) implies 

E[x] = 7r'x(0). 



□ 
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Combining Theorem [T] and Theorem [2](a) (and using the fact that the resuhs hold for 
any x(0)), we have vr = E[7r]. The stationary distribution vf is crucial in understanding 
the formation of opinions since it encapsulates the weight given to each agent (forceful 
or regular) in the (limiting) mean consensus value of the society. We refer to the vector 
vr as the consensus distribution corresponding to the mean interaction matrix W and its 
component vfj as the weight of agent i. 

It is also useful at this point to highlight how consensus will form around the correct 
value in the absence of forceful agents. Let {x{k)} be the belief sequence generated by 
the belief update rule of Eq. (jlj). When there are no forceful agents, i.e. Oij = for all 
then the interaction matrix W{k) for all k is either equal to an averaging matrix 
Aij for some i,j or equal to the identity matrix J; hence, W{k) is a doubly stochastic 
matrix. This implies that the average value of x{k) remains constant at each iteration, 
i.e., 

-y^Xiik) = -y^XiiO) for all A; >0. 

1=1 i=l 

Theorem [1] therefore shows that when there are no forceful agents, the sequences Xi{k) 
for all i, converge to the average of the initial beliefs with probability one, aggregating 
information. We state this result as a simple corollary. 

Corollary 1. Assume that there are no forceful agents, i.e., = for all i,j G TV. We 

have 

1 " 

lim xJk) = — > Xi(0) = 9 with probability one. 

i=l 

Therefore, in the absence of forceful agents, the society is able to aggregate informa- 
tion effectively. Theorem [2] then also implies that in this case vr = vfj = 1/n for all i (i.e., 
beliefs converge to a deterministic value), so that no individual has excess influence. 
These results no longer hold when there are forceful agents. In the next section, we 
investigate the effect of the forceful agents and the structure of the social network on 
the extent of misinformation and excess influence of individuals. 



4 Global Limits on Misinformation 

In this section, we are interested in providing an upper bound on the expected value of 
the difference between the consensus belief x (cf. Theorem [1]) and the true underlying 
state, 9 (or equivalently the average of the initial beliefs), i.e., 

E[x-9]=E[x]-9 = J2(^^--)^^W^ (13) 

(cf. Theorem [2]). Our bound relies on a fundamental theorem from the perturbation 
theory of finite Markov Chains. Before presenting the theorem, we first introduce some 
terminology and basic results related to Markov Chains. 
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4.1 Preliminary Results 

Consider a finite Markov Chain with n states and transition probabihty matrix T0 We 
say that a finite Markov chain is regular if its transition probabihty matrix is a primitive 
matrix, i.e., there exists some integer k > such that all entries of the power matrix 
are positive. The following theorem states basic results on the limiting behavior of 
products of transition matrices of Markov Chains (see Theorems 4.1.4, 4.1.6, and 4.3.1 
in Kemeny and Snell |26j). 

Theorem 3. Consider a regular Markov Chain with n states and transition probability 
matrix T. 

(a) The k^^ power of the transition matrix T, T^, converges to a stochastic matrix T°° 
with all rows equal to the probability vector it, i.e., 

lim = T°° = evr', 

fe— >oo 

where e is the n-dimensional vector of all ones. 

(b) The probability vector tt is a left eigenvector of the matrix T, i.e., 

tt'T = tt' and Tr'e = 1. 
The vector vr is referred to as the stationary distribution of the Markov Chain. 

(c) The matrix Y = [I — T + j^oo^-i _ j^oo jg well-defined and is given by 

oo 
fc=0 

The matrix Y is referred to as the fundamental matrix of the Markov Chain. 

The following theorem provides an exact perturbation result for the stationary dis- 
tribution of a regular Markov Chain in terms of its fundamental matrix. The theorem 
is based on a result due to Schweitzer |32] (see also Haviv and Van Der Hey den |23j). 

Theorem 4. Consider a regular Markov Chain with n states and transition probability 
matrix T. Let tt denote its unique stationary distribution and Y denote its fundamental 
matrix. Let D be an x n perturbation matrix such that the sum of the entries in each 
row is equal to 0, i.e., 

71 

^[D]ij = Q foralH. 



^^We use the same notation as in ([T0|) here, given the close connection between the matrices introduced 
in the next two theorems and the ones in ()10p . 
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Assume that the perturbed Markov chain with transition matrix T = T + D is regular. 
Then, the perturbed Markov chain has a unique stationary distribution vr, and the matrix 
/ — DY is nonsingular. Moreover, the change in the stationary distributions, p = tt — vr, 
is given by 

p' = n'DY{I - DY)-\ 



4.2 Main Results 

This subsection provides bounds on the difference between the consensus distribution 
and the uniform distribution using the global properties of the underlying social network. 
Our method of analysis will rely on the decomposition of the mean interaction matrix 
W given in ffTOj) into the social network matrix T and the influence matrix D. Recall 
that T is doubly stochastic. 

The next theorem provides our first result on characterizing the extent of misinfor- 
mation and establishes an upper bound on the /oo-norm of the difference between the 
stationary distribution vf and the uniform distribution -e, which, from Eq. (fT3!) . also 
provides a bound on the deviation between expected beliefs and the true underlying 
state, 9. 

Theorem 5. (a) Let vf denote the consensus distribution. The /oo-norm of the differ- 
ence between vr and -e is given by 



TT e 

n 



< 



1 Ei,Pij« 



l-S 2n 



where (5 is a constant defined by 



6 = il-nx')-^ 



X 



mm < — 

{i,j)e£ [n 



2 '''2 

and d is the maximum shortest path length in the graph {M,£) [cf. Eq. ([3])]. 

(b) Let X be the limiting random variable of the sequences i E N generated 

by Eq. (jlj) (cf. Theorem [T]). We have 



i=l 



1 J^ijPij'^ij 11 . ^11 
< r ^ ||x(0)||, 



1 



2n 



Proof, (a) Recall that the mean interaction matrix can be represented as 

W = T + D, 
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[cf. Eq. ( 1101) ]. i.e., W can be viewed as a perturbation of the social network matrix T 
by influence matrix D. By Lemma [TOT a). the stationary distribution of the Markov 
chain with transition probability matrix T is given by the uniform distribution ^e. By 
the definition of the matrix D [cf. Eq. ([9])] and the fact that the matrices Aij and Jij 
are stochastic matrices with all row sums equal to one [cf. Eq. ([5])], it follows that the 
sum of entries of each row of D is equal to 0. Moreover, by Theorem [2]( a), the Markov 
Chain with transition probability matrix W is regular and has a stationary distribution 
7f. Therefore, we can use the exact perturbation result given in Theorem H] to write the 
change in the stationary distributions and vf as 



1 \' 1 



TT e 



n / n 



e'DY{I - DY)~\ (14) 



where Y is the fundamental matrix of the Markov Chain with transition probability 
matrix T, i.e., 



Y = ^(T^ - r°°) 



k=0 



with T°° = ^ee' [cf. Theorem |3](c)]. Algebraic manipulation of Eq. IHM yields 

7f - -eV = Ti'DY, 
n / 



implying that 

1 



TT e 

n 



< \\DY\U (15) 



where ||Dy||oo denotes the matrix norm induced by the loo vector norm. 

We next obtain an upper bound on the matrix norm ||Z}y||oo- By the definition of 
the fundamental matrix Y, we have 

oo oo 

= ^D(T'= -T°°) = ^DT^ (16) 

fc=0 fc=0 

where the second equality follows from the fact that the row sums of matrix D is equal 
to and the matrix T°° is given by T°° = -ee'. 

Given any ^(0) G M" with ||2;(0)||oo = 1, let {z{k)} denote the sequence generated by 
the linear update rule 

z{k) = T''z{0) for all A; > 0. 

Then, for all > 0, we have 

DT''z{0) = Dz{k), 
which by the definition of the matrix D [cf. Eq. (|9])] implies 

Dr'^^(0) = i5^p,,a,,^*^(fc), (17) 
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where the vector z^^{k) G M" is defined as 

z'^{k) = [Jij - Aij]z{k) for all i, j, and k>0. 

By the definition of the matrices Jij and Aij [cf. Eq. (j5])], the entries of the vector z^^{k) 
are given by 

f (i-e)(z,(A;)-^,(A;)) if / = 

= S l{zj{k) - Zi{k)) if / = J, (18) 

otherwise. 

This implies that the vector norm ||oo can be upper-bounded by 

||2;*"' (/i;)||oo < - max zAk) — minzAk) for all j, and /c > 0. 

Defining M[k) = maxifzj^f zi{k) and m{k) = mini^j^ Zi{k) for all A; > 0, this implies that 

\\z"^{k)\\oo < \{M{k) - m{k)) < ]^5^ (M(0) - m(0)) for all j, and > 0, 

where the second inequality follows from Lemma [TOlfb) in Appendix C. Combining the 
preceding relation with Eq. (fTTI) . we obtain 



5^ (M(0) -m(0)). 



By Eq. ([ISD, it follows that 



01 ij 



2n\^^''^'-' r ' ' - 2n(l-5) 

k=0 k=0 \ i,j / ^ ^ 

where to get the last inequality, we used the fact that < 5 < 1 and M(0) — m(0) < 1, 
which follows from ||2;(0)||oo = 1- Since z{0) is an arbitrary vector with ||2;(0)||oo = 1) 
this implies that 



l-DF||oo = min llDF^Iloo < — ; ^(y^Pii 



Combining this bound with Eq. fllSI) . we obtain 

1 ^ 1 E^jPii«*j 



TT e 

n 



oo 



1-5 2n 



establishing the desired relation, 
(b) By Lemma [2]^b), we have 



E[x\ = TT x(0). 
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This implies that 

n 

E[x] - - VxifO) = 7r'x(0) - -e'x(O) 
n ^-^ n 

i=l 

The result follows by combining this relation with part (a). 





1 


< 


TT e 




n 



\m\v 



□ 



Before providing the intuition for the preceding theorem, we provide a related bound 
on the /o-norm of the difference between vr and the uniform distribution -e in terms of 

^ n 

the second largest eigenvalue of the social network matrix T, and then return to the 
intuition for both results. 

Theorem 6. Let vf denote the consensus distribution (cf. Lemma [2]). The Z2-norm of 
the difference between vr and -e is given by 



TT e 

n 



< — 

2 - 1 



\2{T) n ' 

where A2(T) is the second largest eigenvalue of the matrix T defined in Eq. 
Proof. Following a similar argument as in the proof of Theorem [5l we obtain 

1 



TT 



n 



< \\DY\ 



(19) 



where ||Dy||2 is the matrix norm induced by the I2 vector norm. To obtain an upper 
bound on the matrix norm ||Dy||2, we consider an initial vector z{Qi) G M" with ||-z(0)||2 = 
1 and the sequence generated by 

z{k + 1) = Tz{k) for all k>Q. 

Then, for all A; > 0, we have 

Dr^^(0) = ij]p,,a,,z^^(fc), 



(20) 



where the entries of the vector z'^^{k) are given by Eq. (fT8|) . We can provide an upper 
bound on the ||2;*-'(A;)||2 as 



{Zj{k) - z) + {z - Zi{k)) 



where z = - ^[Li ^i(^) ^ (note that since T is a doubly stochastic matrix, the 

average of the entries of the vector z{k) is the same for all k). Using the relation 
(a + fc)^ < 2(a^ + 6^) for any scalars a and b, this yields 



1=1 



(21) 
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We have 

z{k + 1) — ze = Tz{k) — ze = T{^z{k) — ze^ 

where the second equahty follows from the stochasticity of the matrix T, implying that 
Te = e. The vector z{}z) — ze is orthogonal to the vector e, which is the eigenvector 
corresponding to the largest eigenvalue Ai = 1 of matrix T (note that Ai = 1 since T 
is a primitive and stochastic matrix). Hence, using the variational characterization of 
eigenvalues, we obtain 

\z{k + 1) - ze\\ < {z{k) - ze)'T'^ {z{k) - ze) < X2{Ty\\z{k) - zeg. 

where A2(T) is the second largest eigenvalue of matrix T, which implies 

Mk) - ze\\l < (A2(T)2)'||2(0) - ze\\l < X^iTfK 



Here the second inequality follows form the fact that ||-2(0)||2 = 1 and z is the average 
of the entries of vector ^(O). Combining the preceding relation with Eq. fl21l) . we obtain 

IN'^'(^)||2 < A2(T)'= for all A; >0. 

By Eq. ([201), this implies that 

||DT^z(0)||2 = -(5^Pi,a,,)A2(T)'= for all k>Q. 

id 

Using the definition of the fundamental matrix Y, we obtain 

k=0 k=0 ^ i,j 



oo oo ^ ^ 

h—n i—n i J 



n 



for any vector z{0) with ||2;(0)||2 = 1. Combined with Eq. (fT9|) . this yields the desired 
result. □ 

Theorem [6] characterizes the variation of the stationary distribution in terms of the 
average influence, ^ihifllf^^ and the second largest eigenvalue of the social network 
matrix T, A2(r). As is well known, the difference 1 — A2(T), also referred to as the 
spectral gap, governs the rate of convergence of the Markov Chain induced by the social 
network matrix T to its stationary distribution (see [ID])- In particular, the larger 
1 — A2(T) is, the faster the k^^ power of the transition probability matrix converges to 
the stationary distribution matrix (cf. Theorem [3]). When the Markov chain converges 
to its stationary distribution rapidly, we say that the Markov chain is fast-mixing^ 



^ ''We use the terms "spectral gap of the Markov chain" and "spectral gap of the (induced) graph" , 
and "fast-mixing Markov chain" and "fast-mixing graph" interchangeably in the sequel. 
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In this light, Theorem [6] shows that, in a fast-mixing graph, given a fixed average infiu- 
ence ^iiil22^^ the consensus distribution is "closer" to the underlying = ^ ^11=1 ^j(O) 
and the extent of misinformation is limited. This is intuitive. In a fast-mixing social 
network graph, there are several connections between any pair of agents. Now for any 
forceful agent, consider the set of agents who will have some infiuence on his beliefs. This 
set itself is connected to the rest of the agents and thus obtains information from the 
rest of the society. Therefore, in a fast-mixing graph (or in a society represented by such 
a graph), the beliefs of forceful agents will themselves be moderated by the rest of the 
society before they spread widely. In contrast, in a slowly-mixing graph, we can have a 
high degree of clustering around forceful agents, so that forceful agents get their (already 
limited) information intake mostly from the same agents that they have infiuenced. If 
so, there will be only a very indirect connection from the rest of the society to the beliefs 
of forceful agents and forceful agents will spread their information widely before their 
opinions also adjust. As a result, the consensus is more likely to be much closer to the 
opinions of forceful agents, potentially quite different from the true underlying state 9. 

This discussion also gives intuition for Theorem [5] since the constant 5 in that result 
is closely linked to the mixing properties of the social network matrix and the social 
network graph. In particular. Theorem O clarifies that 5 is related to the maximum 
shortest path and the minimum probability of (indirect) communication between any 
two agents in the society. These two notions also crucially infiuence the spectral gap 
1 — A2(T„), which plays the key role in Theorem [61 

These intuitions are illustrated in the next example, which shows how in a certain 
class of graphs, misinformation becomes arbitrarily small as the social network grows. 

Example 1. (Expander Graphs) Consider a sequence of social network graphs Qn = 
(N'njAn) induced by symmetric n x n matrices T„ [cf. Eq. ffTTl) ]. Assume that this 
sequence of graphs is a family of expander graphs, i.e., there exists a positive constant 
7 > such that the spectral gap 1 — A2(T„) of the graph is uniformly bounded away 
from 0, independent of the number of nodes n in the graph, i.e.. 



(see [13]) As an example, Internet has been shown to be an expander graph under the 
preferential connectivity random graph model (see [27] and [21]). Expander graphs have 
high connectivity properties and are fast mixing. 

We consider the following infiuence structure superimposed on the social network 
graph Qn- We define an agent j to be locally forceful if he infiuences a constant number of 
agents in the society, i.e., his total infiuence, given by J2i PijCtijy is a constant independent 
of n. We assume that there is a constant number of locally forceful agents. Let 7f„ denote 
the stationary distribution of the Markov Chain with transition probability matrix given 
by the mean interaction matrix W [cf. Eq. ([8])]. Then, it follows from Theorem [6] that 



7 < 1 - HTn) 



for all n. 



1 



n 



e 

n 



as n 



oo. 



2 
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(a) 



(b) 



Figure 1: Impact of location of forceful agents on the stationary distribution (a) Misin- 
formation over the bottleneck (b) Misinformation inside a cluster 



This shows that if the social network graph is fast-mixing and there is a constant number 
of locally forceful agents, then the difference between the consensus belief and the average 
of the initial beliefs vanishes. Intuitively, in expander graphs, as n grows large, the set of 
individuals who are the source of information of forceful agents become highly connected, 
and thus rapidly inherit the average of the information of the rest of the society. Provided 
that the number of forceful agents and the impact of each forceful agent do not grow 
with n, then their influence becomes arbitrarily small as n increases. 



5 Connectivity of Forceful Agents and Misinforma- 
tion 

The results provided so far exploit the decomposition of the evolution of beliefs into 
the social network component (matrix T) and the influence component (matrix D). 
This decomposition does not exploit the interactions between the structure of the social 
network and the location of forceful agents within it. For example, forceful agents located 
in different parts of the same social network will have different impacts on the extent 
of misinformation in the society, but our results so far do not capture this aspect. The 
following example illustrates these issues in a sharp way. 

Example 2. Consider a society consisting of six agents and represented by the (undi- 
rected) social network graph shown in Figure [TJ The weight of each edge is given 
by ^ 

where, for illustration, we choose pij to be inversely proportional to the degree of node 
i, for all j. The self-loops are not shown in Figure [U 

We distinguish two different CcLSGS clS illustrated in Figure [TJ In each case, there is 
a single forceful agent and a = 1/2. This is represented by a directed forceful link. 
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The two cases differ by the location of the forceful link, i.e., the forceful link is over the 
bottleneck of the connectivity graph in part (a) and inside the left cluster in part (b). 
The corresponding consensus distributions can be computed as 



Even though the social network matrix T (and the corresponding graph) is the same in 
both cases, the consensus distributions are different. In particular, in part (a), each agent 
in the left cluster has a higher weight compared to the agents in the right cluster, while 
in part (b), the weight of all agents, except for the forceful and influenced agents, are 
equal and given by 1/6. This is intuitive since when the forceful link is over a bottleneck, 
the misinformation of a forceful agent can spread and influence a larger portion of the 
society before his opinions can be moderated by the opinions of the other agents. 

This example shows how the extent of spread of misinformation varies depending 
on the location of the forceful agent. The rest of this section provides a more detailed 
analysis of how the location and connectivity of forceful agents affect the formation of 
opinions in the network. We proceed as follows. First, we provide an alternative exact 
characterization of excess influence using mean first passage times. We then introduce 
the concept of essential edges, similar to the situation depicted in Example [H and provide 
sharper exact results for graphs in which forceful links coincide with essential edges. We 
then generalize these notions to more general networks by introducing the concept of 
information bottlenecks, and finally, we develop new techniques for determining tighter 
upper bounds on excess influence by using ideas from graph clustering. 

5.1 Characterization in Terms of Mean First Passage Times 

Our next main result provides an exact characterization of the excess influence of agent 
i in terms of the mean passage times of the Markov chain with transition probability 
matrix T. This result, and those that follow later in this section, will be useful both to 
provide more informative bounds on the extent of misinformation and also to highlight 
the sources of excess influence for certain agents in the society. 

We start with presenting some basic definitions and relations (see Chapter 2 of [2]). 

Definition 1. Let [X^, t = 0,1,2, .. .) denote a discrete-time Markov chain. We denote 
the first hitting time of state i by 



and the mean commute time between state i and state j as niij + mji. Moreover, we 
define the mean first return time to a particular state i as 




inf {t >0\Xt=i}. 



We define the mean first passage time from state i to state j as 



niij = E[Tj \ Xo = i]. 



mi = E[T+\Xo = z], 
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where 

7;+ = inf{t > 1 I Xt=t}. 

Lemma 1. Consider a Markov chain with transition matrix Z and stationary distribu- 
tion vr. We have: 

(i) The mean first return time from state i to i is given by = l/vTj. 

(ii) The mean first passage time from i to j is given by 

Y — Y 

-'11 11 

rriij = --^ 

where Y = YlT=oi^'^ ~^ ^°°) fundamental matrix of the Markov chain. 

We use the relations in the preceding lemma between the fundamental matrix of 
a Markov chain and the mean first passage times between states, to provide an exact 
characterization of the excess influence of agent k. 

Theorem 7. Let tt denote the consensus distribution. We have: 

(a) For every agent k 

TTfc - ^ = -^^Pij^^ij ((1 - 2e)7fi + TTj j [niik - mjk) . 

(b) Let Aj denote the set of edges over which there is a forceful link, i.e., 

Aj = {{?, j} eA\aij>0 or aji > o|. 

Assume that for any {i,j}, {k, 1} G Aj, we have {i,j} H {k, 1} = 0. Then, 

1 1 sr--Pijaij{l - e) 



where 

Cij — Cji 



(- \ -1 1 _ rl -d \ 



and rriij is the mean first passage time from state i to state j of a Markov chain with 
transition matrix given by the social network matrix T [cf. Eq. (jH])]. 
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Proof, (a) Following the same line of argument as in the proof of Theorem [5l we can 
use the perturbation results of Theorem H] to write the excess influence of agent k as 

vr, - - = 7^'D[F]^ (23) 
n 

where Y is the fundamental matrix of a Markov chain with transition matrix T. Using 
([5]), and the definition oi D in iQ we have 

1 ^v- a - f (i - ^) (^.-^ - ^^^) if^ = ^' 



otherwise. 



Hence, we can write right-hand side of Eq. ( I23l) as follows 



- ^ = E ^((1 - 2^)^^ + (^^-^ - ^^^)- (24) 
By Lemma [T](ii), we have 



Yjk - Yik = -{rriik - rrijk), (25) 
n 

where Y is the fundamental matrix of the Markov chain with transition matrix T. The 
desired result follows by substituting the preceding relation in Eq. (^^. 

(b) In view of the assumption that all edges in Aj are pairwise disjoint, the perturbation 
matrix D decomposes into disjoint blocks, i.e., 

n 



D= Yl Dij + Dj,, where A, = [J.i - Ai,] . (26) 



For each edge G Ax, it is straightforward to show that 

((A, + D,,)Yy = (l - ^) (A, + D,,)Y 
Using the decomposition in Eq. fl2Bl) and the preceding relation, it can be seen that 



DYiI-DYr = J2{l-^^)"D^^Y. 



Combined with the exact perturbation result in Theorem HI this implies that 

Ttk-- = -[e'DY{I-DY)-% 
n n 

- ^E(i-|)"i''A,n 

hi 

= Yl l-QJn^ {Y,k-Y,k). 
The main result follows by substituting Eq. (125!) in the above equation. □ 
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Part (a) of Theorem [7| provides an exact expression for the excess influence of agent 
A; as a function of the mean first passage times from agent (or state) k to the forceful and 
influenced agents. The excess influence of each agent therefore depends on the relative 
distance of that agent to the forceful and the influenced agent. To provide an intuition 
for this result, let us consider the special case in which there is a single forceful link 
(j, i) in the society (i.e., only one pair of agents i and j with aij > 0) and thus a single 
forceful agent j. Then for any agent k, their only source of excess influence can come 
from their (potentially indirect) impact on the beliefs of the forceful agent j. This is why 
rrijk, which, loosely speaking, measures the distance between j and k, enters negatively 
in to the expression for the excess influence of agent k. In addition, any agent who 
meets (communicates) with agent i with a high probability will be indirectly influenced 
by the opinions of the forceful agent j. Therefore, the excess influence of agent k is 
increasing in his distance to i, thus in rriik. In particular, in the extreme case where 
rriik is small, agent k will have negative excess influence (because he is very close to the 
heavily "influenced" agent i) and in the polar extreme, where rrijk is small, he will have 
positive excessive influence (because his views will be quickly heard by the forceful agent 
j). The general expression in part (a) of the theorem simply generalizes this reasoning 
to general social networks with multiple forceful agents and several forceful links. 

Part (b) provides an alternative expression [cf. Eq. fl22l) ]. with a similar intuition 
for the special case in which all forceful links are disjoint. The main advantage of the 
expression in part (b) is that, though more complicated, is not in terms of the expected 
consensus distribution vr (which is endogenous). Disjoint forceful link property in part 
(b) is also useful because it enables us to isolate the effects of the forceful agents. The 
parameter in Eq. (122|) captures the asymmetry between the locations of agents i and 
j in the underlying social network graph. Although the expression for excess influence 
in part (a) of Theorem [7] is a function of the consensus distribution -ft, each element of 
this vector (distribution) can be bounded by 1 to obtain an upper bound for the excess 
influence of agent k. 

Using the results in Theorem [71 the difference between the consensus distributions 
discussed in Example |2] can be explained as follows. In Example |2](a), the mean first 
passage time from agent 4 to any agent k in the left cluster is strictly larger than that 
of agent 3 to agent k, because every path from agent 4 to the left cluster should pass 
through agent 3. Therefore, m^k > f^sk for k = 1,2, 3, and agents in the left cluster have 
a higher consensus weight. In Example [2]^b), due to the symmetry of the social network 
graph, the mean first passage times of agents 1 and 2 to any agent k ^ 1,2 are the same, 
hence establishing by Theorem [7] the uniform weights in the consensus distribution. 

In the following we study the effect of the location of a forceful link on the excess 
influence of each agent by characterizing the relative mean first passage time \mik — mjk\, 
in terms of the properties of the social network graph. 
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5.2 Forceful Essential Edges 

In this subsection, we provide an exact characterization of the excess influence of agent k 
exphcitly in terms of the properties of the social network graph. We focus on the special 
case when the undirected edge between the forceful and the influenced agent is essential 
for the social network graph, in the sense that without this edge the graph would be 
disconnected. We refer to such edges as forceful essential edges. Graphs with forceful 
essential edges approximate situations in which a forceful agent, for example a media 
outlet or political leader, itself obtains all of its information from a tightknit community. 
We first give the definition of an essential edge of an undirected graph. 

Definition 2. Let Q = {f/,A) be an undirected graph. An edge {i,j} G ^ is an 
essential edge of the graph Q = (A/", A) if its removal would partition the set of nodes 
into two disjoint sets N{i,j) C M with i G N{i,j), and N{j,i) C M with j G N{j,i). 

The following lemma provides an exact characterization of the mean first passage 
time from state i to state j, where i and j are the end nodes of an essential edge {i,j}- 

Lemma 2. Consider a Markov chain with a doubly stochastic transition probability 
matrix T. Let {i,j} be an essential edge of the social network graph induced by matrix 
T. 

(a) We have 

J-ij 

(b) For every k G N{j,i), 

- rrijk = rriij. 

Proof. Consider a Markov chain over the set of states A/"' = N{i, j) U {j}, with transition 
probabilities 

Tki = Tki, for all k I. 
For the new chain with stationary distribution vr we have 

T- • T- 
^ _ 2J± — 

^ - f ~ \Nit,j)\+T,,' 

where T is the total edge weight in the new chain. 

Since {i,j} is essential, every path from i to j should pass through {i,j}. Moreover, 
because of equivalent transition probabilities between the new Markov chain and the 
original one on A/"', the mean passage time rriij of the original Markov chain is equal to 
mean passage time rhij of the new chain. On the other hand, for the new chain, we can 
write the mean return time to j as 

rh^ = 1 + rhij = 1 + mij, 
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which imphes [cf. Lemma [U^i)] 

1 . \Nit,j)\ 

The second part of the claim follows from the fact that all of the paths from i to k 
must pass through {«, j}, because it is the only edge connecting N{i,j) to N{j, i). Thus, 
we conclude 

= rriij + rrijk- 

□ 

We use the relation in Lemma [2] to study the effect of a single forceful link over an 
essential edge on the excess influence of each agent. 

Theorem 8. Let vr denote the consensus distribution. Assume that there exists a single 
pair {i-ii} for which the influence probability aij > 0. Assume that the edge {i,j} is an 
essential edge of the social network graph. Then, we have for all k, 



1 2 OiJl 



where 



and 



TTfc - - = — ^ ^ ^ij{k) 

n n2^_^(^(i + 2e)|Ar(^,j)|-|A^(j,0| 

g. . — Pij(^ij 

\N{^,J)\, keN{j,z), 
-\N{j,z)\, keN{i,j). 



^.,{k) = 

Proof. Since edge is essential, by Lemma [2] we have for every k G N{j,i) 

\N{t,j)\ 2n\Nit,j)\ 

rriik - mjk = rriij = 



Similarly, for every k G N[i,j), we obtain 

2n\Nij,z)\ 

rriik — rrijk = —rriji = -. 

Pij{l - %j) + Pji{l - -fji) 

Combining the preceding relations, we can write for the relative mean passage time 
rriik — TTT'jk = Since is the only forceful link, we can apply Theorem 

W[h) to get 

n 1 — Qij/n^ 

where C,ij is given by 

Cij = ^^Y^ [(1 + 2e)m,j - mji] . 
Combining the above relations with Lemma [2](i) establishes the desired result. □ 
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Theorem [8] shows that if two clusters of agents, e.g., two communities, are connected 
via an essential edge over which there is a forceful link, then the excess influence of 
all agents within the same cluster are equal (even when the cluster does not have any 
symmetry properties). This implies that the opinions of all agents that are in the same 
cluster as the forceful agent affect the consensus opinion of the society with the same 
strength. This property is observed in part (a) of Example [2l in which edge {3,4} is 
an essential edge. Intuitively, all of the agents in that cluster will ultimately shape 
the opinions of the forceful agent and this is the source of their excess influence. The 
interesting and surprising feature is that they all have the same excess influence, even if 
only some of them are directly connected to the forceful agent. Loosely speaking, this 
can be explained using the fact that, in the limiting distribution, it is the consensus 
among this cluster of agents that will impact the beliefs of the forceful agent, and since 
within this cluster there are no other forceful agents, the consensus value among them 
puts equal weight on each of them (recall Corollary [T]) . 

5.3 Information Bottlenecks 

We now extend the ideas in Theorem [8] to more general societies. We observed in 
Example [2] and Section 15.21 that influence over an essential edge can have global effects 
on the consensus distribution since essential edges are "bottlenecks" of the information 
flow in the network. In this subsection we generalize this idea to influential links over 
bottlenecks that are not necessarily essential edges as defined in Definition [2l Our goal 
is to study the impact of influential links over bottlenecks on the consensus distribution. 

To achieve this goal, we return to the characterization in Theorem [3, which was in 
terms of first mean passage times, and then provide a series of (successively tighter) 
upper bounds on the key term [rriik — rrijk) in Eq. (l22l) in this theorem. Our first 
bound on this object will be in terms of the minimum normalized cut of a Markov chain 
(induced by an undirected weighted graph), which is introduced in the next definition. 
We will use the term cut of a Markov Chain (or cut of an undirected graph) to denote 
a partition of the set of states of a Markov chain (or equivalently the nodes of the 
corresponding graph) into two sets. 

Definition 3. Consider a Markov chain with set of states J\f, symmetric transition 
probability matrix Z, and stationary distribution vr. The minimum normalized cut 
value (or conductance) of the Markov chain, denoted by p, is defined as 



P 




(27) 



where Q{A, B) = XlieAjeB '^i^) = J2ies refer to the cut that achieves 

the minimum in this optimization problem as the minimum normalized cut. 



The objective in the optimization problem in fl27j) is the (normalized) conditional 
probability that the Markov chain makes a transition from a state in set S* to a state 
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in set S'^ given that the initial state is in S. The minimum normahzed cut therefore 
characterizes how fast the Markov chain will escape from any part of the state space, 
hence is an appropriate measure of information bottlenecks or the mixing time of the 
underlying graph. Clearly, the minimum normalized cut value is larger in more connected 
graphs. 

The next lemma provides a relation between the maximum mean commute time of 
a Markov chain (induced by an undirected graph) and the minimum normalized cut of 
the chain, which is presented in Section 5.3 of Aldous and Fill [2]. This result will then 
be used in the next theorem to provide an improved bound on the excess influences by 
using the fact that |mjfc — mj^l < maxj {rriij, rriji} (see, in particular, proof of Theorem 



Lemma 3. Consider an n-state Markov chain with transition matrix Z and stationary 
distribution vr. Let p denote the minimum normalized cut value of the Markov chain 
(cf. Definition [3]). The maximum mean commute time satisfies the following relation: 



We use the preceding relation together with our characterization of excess influence 
in terms of mean first passage times in Theorem [7] to obtain a tighter upper bound on the 
loo norm of excess influence than in Theorem [5l This result, which is stated next, both 
gives a more directly interpretable limit on the extent of misinformation in the society 
and also shows the usefulness of the characterization in terms of mean first passage times 
in Theorem [7l 

Theorem 9. Let ft denote the consensus distribution. Then, we have 



where p is the minimum normalized cut value of the Markov chain with transition 
probability matrix given by the social network matrix T (cf. Definition [3]). 

Proof. By Theorem [7] we have for every k 



ED. 






1 

TTfc 




((1 - 2e)7ii + nj)\mik - rrijk 



n 



< 




- rrijk 




(29) 
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where (129|) holds because niik < rriij +mjk, and rrijk < mji + niik, and the last inequality 
follows from Eq. (1281) . and the fact that n = ^e. □ 

One advantage of the result in this theorem is that the bound is in terms of p, the 
minimum normalized cut of the social network graph. As emphasized in Definition [3l 
this notion is related to the strength of (indirect) communication links in the society. 
Although the bound in Theorem M is tighter than the one we provided in Theorem 0, it 
still leaves some local information unexploited because it focuses on the maximum mean 
commute times between all states of a Markov chain. The following example shows how 
this bound may be improved further by focusing on the mean commute time between 
the forceful and the influenced agents. 

Example 3. (Barbell graph) The barbell graph consists of two complete graphs each 
with rii nodes that are connected via a path that consists of 77.2 nodes (cf. Figure [2]). 
Consider the asymptotic behavior 

n — > 00, rii/n u, n2/n ^ 1 — 2z/, 

where n = 2ni + 77-2 denotes the total number of nodes in the barbell graph, and < 
< |. The mean first passage time from a particular node in the left bell to a node in 
the right bell is 0{n^) as n —* 00, while the mean passage time between any two nodes 
in each bell is 0(n) (See Chapter 5 of [2] for exact results). Consider a situation where 
there is a single forceful link in the left bell. 

The minimum normalized cut for this example is given by cut Cq, with normalized 
cut value 0(l/n), which captures the bottleneck in the global network structure. Since 
the only forceful agent is within the left bell in this example, we expect the flow of 
information to be limited by cuts that separate the forceful and the influenced agent, 
and partition the left bell. Since the left bell is a complete graph, the cuts associated 
with this part of the graph will have higher normalized cut values, thus yielding tighter 
bounds on the excess influence of the agents. In what follows, we consider bounds in 
terms of "relative cuts" in the social network graph that separate forceful and influenced 
agents in order to capture bottlenecks in the spread of misinformation (for example, cuts 
Ci, C2, and C3 in Figure [2]). 

5.4 Relative Cuts 

The objective of this section is to improve our characterization of the extent of misin- 
formation in terms of information bottlenecks. To achieve this objective, we introduce 
a new concept, relative cuts, and then show how this new concept is useful to derive im- 
proved upper bounds on the excess influence of different individuals and on the extent of 
misinformation. Our strategy is to develop tighter bounds on the mean commute times 
between the forceful and influenced agents in terms of relative cut values. Together with 
Theorem [3, this enables us to provide bounds on the excess influence as a function of 
the properties of the social network graph and the location of the forceful agents within 
it. 
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Figure 2: The barbell graph with ni = 8 nodes in each bell and n2 = 4. There is a single 
forceful link, represented by a directed link in the left bell. 

Definition 4. Let Q = (A/", ^) be an undirected graph with edge weight given by 
Wij. The minimum relative cut value between a and b, denoted by Cab, is defined as 

Cab = inf < ^ Wij I S C J\f,a e S,b ^ S >. 

We refer to the cut that achieves the minimum in this optimization problem as the 

minimum relative cut. 

The next theorem uses the extremal characterization of the mean commute times 
presented in Appendix D, Lemma [TTl to provide bounds on the mean commute times in 
terms of minimum relative cut values. 

Theorem 10. Let Q = (A/", A) be the social network graph induced by the social network 
matrix T and consider a Markov chain with transition matrix T. For any a, 6 G A/", the 
mean commute time between a and b satisfies 

2 

Ti n 

— < fUab + rriba < — , (30) 

Cab Cab 

where Cab is the minimum relative cut value between a and b (cf. Definition |4]) . 

Proof. For the lower bound we exploit the extremal characterization of the mean com- 
mute time given by Eq. (l54l) in Lemma [Til For any S* C A/" containing a and not 
containing b, pick the function gs as follows: 

0, leS; 
I 1, otherwise. 
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The function g is feasible for the maximization problem in Eq. (!54|) . Hence, 
rriab + rriba > 



id i,j 



i€S jeS" 

71 

for all 5 C AT, a e ^, 6 ^ 5. 



The tightest lower bound can be obtained by taking the largest right-hand side in the 
above relation, which gives the desired lower bound. 

For the upper bound, similar to Proposition 2 in Chapter 4 of [2] , we use the second 
characterization of the mean commute time presented in Lemma [TTJ Note that any unit 
flow from a to 6 is feasible in the minimization problem in Eq. (!55ll . Max-flow min-cut 
theorem implies that there exists a flow / of size Cab from a to 6 such that |/^ | < Tjj for 
all edges {i, j} € A. Therefore, there exists a unit flow / = {f*/cab) from a to 6 such 
that \fij\ < Tij/cab for all edges By deleting flows around cycles we may assume 

that 

J 1, if = a,6, , . 

- 1 2, otherwise. ^'^^^ 
1=1 ^ 

Therefore, by invoking Lemma [TT] from Appendix D, we obtain 

rriab + rriba < (^^Tij^ E 7^ - ~ E '•^^^l 

i,j {i,j}€A {id}&A 

< —, 

Cab 

where the last inequality follows from flHT]) . □ 

The minimum relative cut for the barbell graph in Example [3] is given by cut Ci with 
relative cut value 0(1). An alternative relative cut between the forceful and influenced 
agents that partitions the left bell is cut C3, which has relative cut value 0{n), and 
therefore yields a tighter bound on the mean commute times. Comparing cut Ci to cut 
C3, we observe that C3 is a balanced cut, i.e., it partitions the graph into parts each with 
a fraction of the total number of nodes, while cut Ci is not balanced. In order to avoid 
unbalanced cuts, we introduce the notion of a normalized relative cut between two nodes 
which is a generalization of the normalized cut presented in Definition [3l 

Definition 5. Consider a Markov chain with set of states J\f, transition probability 
matrix Z, and stationary distribution vr. The minimum normalized relative cut value 
between a and b, denoted by pab, is defined as 

pab 



inf I \aeS,b^s}, 
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where Q{A,B) = J2ieA jes'^i^ij^ ^('5') = Xlies^^- refer to the cut that achieves 
the minimum in this optimization problem as the minimum normalized relative cut. 

The next theorem provides a bound on the mean commute time between two nodes 
a and 6 as a function of the minimum normahzed relative cut value between a and b. 

Theorem 11. Consider a Markov chain with set of states A/", transition probability 
matrix Z, and uniform stationary distribution. For any a, 6 G TV, we have 

3n log n 
rriab + rriba < , 

Pab 

where pab is the minimum normalized relative cut value between a and h (cf. Definition 

^. 

Proof. We present a generalization of the proof of Lemma [3] by Aldous and Fill [2] , for 
the notion of normalized relative cuts. The proof relies on the characterization of the 
mean commute time given by Lemma [TT] in Appendix D. For a function < (7 < 1 with 
g{a) = and g{b) = 1, order the nodes as a = 1, 2, . . . , n = 6 so that g is increasing. 
The Dirichlet form (cf. Definition [H]) can be written as 

^{9,9) = ^^^-^iZikigik) - g(i)Y 

i k>i 

i k>i i<j<k 
n-1 

n-1 

> Y.i9(3 + ^)-9{j)Ypabn{AMA^), (32) 

where Aj = {1,2, ... and the last inequality is true by Definition [51 On the other 
hand, we have 

n-1 

1 = g{b) - g{a) = {9U + 1) " 9{j)) (pa6vr(A,)7r(yip) ^ (p„,7r(A,)vr(ylp)-l 



Using the Cauchy-Schwartz inequality and Eq. (l32l) . we obtain 



n-1 



^(9,9)- Pabf^,n{AMA]y ^ ' 

But TT{Aj) = j/n, because the stationary distribution of the Markov chain is uniform. 
Thus, 

n— 1 ^ n— 1 9 

El X - 
, . , — = > —. r < 3nlogn. 
7r(A,)7r(A^^) ^j(n-j)- 
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Therefore, by applying the above relation to Eq. (133|) we conclude 

1 3n log n 



£(9,9) Pab 

The above relation is valid for every function g feasible for the maximization problem 
in Eq. fl5^ . Hence, the desired result follows from the extremal characterization of the 
mean commute time given by Lemma [TTl □ 

Note that the minimum normalized cut value of a Markov chain in Definition [3] can 
be related to normalized relative cut values as follows: 

Therefore, the upper bound given in Theorem [TT] for the mean commute time is always 
tighter than that provided in Lemma [3l 

Let us now examine our new characterization in the context of Example [31 The 
minimum normalized relative cut is given by cut C2 with (normalized relative cut) value 
0(1). Despite the fact that C2 is a balanced cut with respect to the entire graph, it 
is not a balanced cut in the left bell. Therefore, it yields a worse upper bound on 
mean commute times compared to cut C3 [which has value 0{n)]. These considerations 
motivate us to consider balanced cuts within subsets of the original graph. In the 
following we obtain tighter bounds on the mean commute times by considering relative 
cuts in a subset of the original graph. 

Definition 6. Consider a weighted undirected graph, {f/,A), with edge {i,j} weight 
given by Wij. For any S C A/", we define the subgraph of {Af,A) with respect to S as 
a weighted undirected graph, denoted by {S,As), where As contains all edges of the 
original graph connecting nodes in S with the following weights 



Wij, i j; 

Wii + J2keS-^ik^ i=j- 

The next lemma uses the Monotonicity Law presented in Appendix D, Lemma [T2] to 
relate the mean commute times within a subgraph to the mean commute times of the 
original graph. 

Lemma 4. Let G = {N,A) be an undirected graph with edge {i,j} weight given by 
Wij. Consider a Markov chain induced by this graph and denote the mean first passage 
times between states i and j by rriij. We fix nodes a,b E N, and S N containing a 
and b. Consider a subgraph of (A^, A) with respect to S (cf. Definition [6]) and let rhij 
denote the mean first passage time between states i and j for the Markov chain induced 
by this subgraph. We have, 

w{S) 

where w is the total edge weight of the original graph, and w{S) is the total edge weight 
of the subgraph, i.e., w{S) = Eies EjeA^^^i- 
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Proof. Consider an undirected graph (A/", A) with modified edge weights Wij given by 

{Wij, i ^ j e S, or i j e S" ; 

0, ieSJeS''; 

Hence, Wij < Wij for all i j, but the total edge weight w remains unchanged. By 
Monotonicity Law (cf. Lemma fT2l) . the mean commute time in the original graph is 
bounded by that of the modified graph, i.e., 

rriab + < rhab + rhba- (34) 

The mean commute time in the modified graph can be characterized using Lemma 
[TT] in terms of the Dirichlet form defined in Definition [HI In particular. 



{rriab + mba) ^ = inf \- Y] Wij{g{i) - g{j)Y : g{a) = 0,g{b) = l| 

= nKi{-E^^:'(^(^)-^(^'))'^ g{a)=0,g{b) = l} 

i,jes 

- ,M, { E JfrfeW - surf : = o.aib) = i} 



W 0<g<l I ^ w(S) 

w(S) , _ ,1 

= {mab + mba) , 

w 

where the second equality holds by definition of w, and the last equality is given by 
definition of w, and the extremal characterization of the mean commute time in the 
subgraph. The desired result is established by combining the above relation with flM|) . 

□ 

Theorem 12. Let Q = (A/", A) be the social network graph induced by the social network 
matrix T and consider a Markov chain with transition matrix T. For any a, 6 G A/", and 
any S* C A/" containing a and b, we have 

3n log IS"! 

rriab + rriba < 



Pab{S) 



where pab{S) is the minimum normalized cut value between a and b on the subgraph of 
{M,A) with respect to S, i.e., 

PabiS) = jnf^ l-^l • (35) 
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Proof. By Lemma HJ we have 



w 



{niab + rriba) 



rriab + rriba < 



w{S) 



{niab + mba), 



(36) 



where fhab is the mean first passage time on the subgraph (5, ^5). 

On the other hand, Definition [6] imphes that for the subgraph (5, ^5), we have for 
every i E S 



Hence, the stationary distribution of the Markov chain on the subgraph is uniform. 
Therefore, we can apply Lemma [TT] to relate the mean commute time within the subgraph 
(5,^5) to its normalized relative cuts, i.e.. 



where Pab{S) is the minimum normalized cut between a and b given by Definition [5] on 
the subgraph. Since the stationary distribution of the random walk on the subgraph is 
uniform, we can rewrite Pab{S) as in fl35|) . Combining the above inequality with Eq. fl36l) 



Theorem [T2] states that if the local neighborhood around the forceful links are highly 
connected, the mean commute times between the forceful and the infiuenced agents 
will be small, implying a smaller excess infiuence for all agents, hence limited spread of 
misinformation in the society. The economic intuition for this result is similar to that for 
our main characterization theorems: forceful agents get (their limited) information intake 
from their local neighborhoods. When these local neighborhoods are also connected 
to the rest of the network, forceful agents will be indirectly infiuenced by the rest of 
the society and this will limit the spread of their (potentially extreme) opinions. In 
contrast, when their local neighborhoods obtain most of their information from the 
forceful agents, the opinions of these forceful agents will be reinforced (rather than 
moderated) and this can significantly increase their excess infiuence and the potential 
spread of misinformation. 

Let us revisit Example [31 and apply the result of Theorem [T2] where the selected 
subgraph is the left cluster of nodes. The left bell is approximately a complete graph. 
We observe that the minimum normalized cut in the subgraph would be of the form of 
C3 in Figure [21 and hence the upper bound on the mean commute time between i and j 
is O(nlogn), which is close to the mean commute time on a complete graph of size n. 

Note that it is possible to obtain the tightest upper bound on mean commute time 
between two nodes by minimizing the bound in Theorem [12] over all subgraphs 5* of the 
social network graph. However, exhaustive search over all subgraphs is not appealing 




rriab + ruba < 



3|^|log|^| 

Pab{S) 



establishes the theorem. 



□ 
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from a computational point of view. Intuitively, for any two particular nodes, the goal 
is to identify whether such nodes are highly connected by identifying a cluster of nodes 
containing them, or a bottleneck that separates them. In the following section we present 
a hierarchical clustering method to obtain such a cluster using a recursive approach. 

5.4.1 Graph Clustering 

We next present a graph clustering method to provide tighter bounds on the mean 
commute time between two nodes a and b by systematically searching over subgraphs S 
of the social network graph that would yield improved normalized cut values. The goal 
of this exercise is again to improve the bounds on the term {rriik — mjk) in Eq. (I22p in 
Theorem [71 

The following algorithm is based on successive graph cutting using the notion of 
minimum normalized cut value defined in Definition [31 This approach is similar to 
the graph partitioning approach of Shi and Malik [34] applied to image segmentation 
problems. 

Algorithm 1. Fix nodes a, b on the social network graph (A/", A). Perform the following 
steps: 

1. k = 0, Sk= U. 

2. Define pk as 



with SI as an optimal solution. 

3. Ifa,be SI, then Sk+i = SI; k^k + 1; Goto 2. 

4. Ifa,beSk\ SI, then S^+i = Sk\Sl; k ^ k + 1; Goto 2. 

5. Return ^lilHi-^. 

Pk 

Figure [3l illustrates the steps of Algorithm [H for a highly clustered graph. Each of the 
regions in Figure [3l demonstrate a highly connected subgraph. We observe that the global 
cut given by 5*1 does not separate a and b, so it need not give a tight characterization of 
the bottleneck between a and b. Nevertheless, 5*1 gives a better estimate of the cluster 
containing a and b. Repeating the above steps, the cluster size reduces until we obtain 
a normalized cut separating a and b. By Theorem [121 this cut provides a bound on 
the mean commute time between a and b that characterizes the bottleneck between 
such nodes. So far, we have seen in this example and Example [2l that graph clustering 
via recursive partitioning can monotonically improve upon the bounds on the excess 
influence (cf. Theorem [T^j). Unfortunately, that is not always the C3jSG clS discussed in 
the following example. In fact, we need further assumptions on the graph in order to 
obtain monotone improvement via graph clustering. 



Pk 



inf \Si 

scSk 



k 
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Figure 4: Social network graph with a central hub 
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Example 4. Consider a social network graph of size n depicted in Figure HI The central 
region is a complete graph of size n/2. Each of the k clusters on the cycle is a complete 
graph of size n/{2k), which is connected to the central hub via edges of total weight h. 
Moreover, the clusters on the cycle are connected with total edge weight r. 

If r > kh/8, then Co would be the minimum normalized cut rather than cuts of the 
form Ci- Hence, po in step 2 of Algorithm [1] is given by 

kh Akh 
Po = = —-. 

2 ■ 2 "' 

After removing the central cluster, we obtain C2 as the minimum normalized cut 
over the cycle, with the following value 

n 2r 16r 



Pi 9 n _ n ■ 

-^44 "' 

Therefore, we conclude that pi < po if and only if ^ < r < i.e., the upperbound 
obtained by Algorithm[T]on the mean commute time between a and b, is not smaller than 
that of Lemma [31 That is because by removing the central cluster, we have eliminated 
the possibility of reaching the destination via shortcuts of the central hub, and the only 
way to reach the destination is to walk through the cycle. 

Next, we show that the bounds given by Algorithm [1] are monotonically improving, 
if the successive cuts are disjoint. 

Definition 7. Consider an undirected graph {Af,A). The cuts defined by 81,82 C Af 
are disjoint with respect to M if 

5{8i) n 5{82) = 0, 

where 



5{8) = [{i,j}eA\ie8,j e8']. 

Theorem 13. Let pk and 8^ be generated by the k^^ iteration of running Algorithm 
[Hon the social network graph [M^A). If the cuts corresponding to 8k+i and 8k+2 are 
disjoint with respect to 8k, then pk+i > pk- 

Proof. By definition of pk in step 2 of Algorithm [H we have for 8k+2 ^ 8k 

le I ^«G5fc+ije5fc\S'fc+i ^ I c I ^«GS'fc+2,iGS'fc\5fc+2 *J /■o^\ 
Pk - \ok\-r^ 1 — 10 \ o r - I'^'^'Tc \ — 10 \ o r- ^'^'^ 

\^k+l\ ' \^k \ ^k+l\ \^k+2\ ■ Pfc \ *-'fc+2| 

But 8k+i and 5*^+2 are disjoint with respect to 8k, and 5^+2 ^ •S'^+i C 8k- It is 
straightforward to show that 

\^{t,j}EA\ie 8k+2,j e 8k \ 8k+i^ c 6i8k+i) n 6i8k+2) = 0, 
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which imphes 

«6Sfe4.2jG5fc\5fe+2 i&Sk+2,j&Sk\Sk+i ieSk+2,j&Sk+i\Sk+2 i&Sk+2,j&Sk+i\Sk+2 

Therefore, by combining the above relation with (l37|l and the definition of pfc+i, we 
obtain 

Pk \ \Sk+2\ ■ \Sk+l \ Sk+2\ J \ \Sk+2\ ■ I'S'fc \ Sk+2\ J 



\Sk+l\ 




\Sk^ 


~i i\Sk \ Sk+i\ + \Sk+i \ Sk+2\) 


\Sk\ ■ 


\Sk+l \ Sk+2\ 


\Sk-] 


-1 \ Sk+2\ {\Sk+i\ + \Sk \ Sk+i\) 


(- 


\Sk \ Sk+i\ \ 1 


\ + 


\Sk\Sk+i\\ 


\Sk+l \ Sk+2\J ^ 


\Sk+i\ J 



(38) 



> 1, 

where fl55]) holds because 5^+2 C Sk+i ^ 5*^, and the last inequality is true because 
Sk+i \ Sk+2 ^ Sk+1, and 5^+2 is nonempty. □ 

6 Conclusions 

This paper analyzed the spread of misinformation in large societies. Our analysis is 
motivated by the widespread differences in beliefs across societies and more explicitly, 
the presence of many societies in which beliefs that appear to contradict the truth can 
be widely held. We argued that the possibility that such misinformation can arise and 
spread is the manifestation of the natural tension between information aggregation and 
misinformation spreading in the society. 

We modeled a society as a social network of agents communicating (meeting) with 
each other. Each individual holds a belief represented by a scalar. Individuals meet 
pairwise and exchange information, which is modeled as both individuals adopting the 
average of their pre-meeting beliefs. When all individuals engage in this type of infor- 
mation exchange, the society will be able to aggregate the initial information held by all 
individuals. This effective information aggregation forms the benchmark against which 
we compared the possible spread of misinformation. 

Misinformation is introduced by allowing some agents to be "forceful," meaning that 
they influence the beliefs of (some) of the other individuals they meet, but do not 
change their own opinion. When the influence of forceful agents is taken into account, 
this deflnes a stochastic process for behef evolution, and our analysis exploited the fact 
that this stochastic process (Markov chain) can be decomposed into a part induced by 
the social network matrix and a part corresponding to the influence matrix. 

Under the assumption that even forceful agents obtain some information (however 
infrequent) from some others, we flrst show that beliefs in this class of societies converge 
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to a consensus among all individuals (under some additional weak regularity conditions). 
This consensus value is a random variable, and the bulk of our analysis characterizes its 
behavior, in particular, providing bounds on how much this consensus can differ from 
the efficient information aggregation benchmark. 

We presented three sets of results. Our first set of results quantify the extent of mis- 
information in the society as a function of the number and properties of forceful agents 
and the mixing properties of the Markov chain induced by the social network matrix. 
In particular, we showed that social network matrices with large second eigenvalues, or 
that correspond to fast-mixing graphs, will place tight bounds on the extent of misinfor- 
mation. The intuition for this result is that in such societies individuals that ultimately 
have some influence on the beliefs of forceful agents rapidly inherit the beliefs of the rest 
of the society and thus the beliefs of forceful agents ultimately approach to those of the 
rest of the society and cannot have a large impact on the consensus beliefs. The extreme 
example is provided by expander graphs, where, when the number and the impact of 
forceful agents is finite, the extent of misinformation becomes arbitrarily small as the 
size of the society becomes large. In contrast, the worst outcomes are obtained when 
there are several forceful agents and forceful agents themselves update their beliefs only 
on the basis of information they obtain from individuals most likely to have received 
their own information previously (i.e., when the graph is slow-mixing). 

Our second set of results exploit more explicitly the location of forceful agents within 
a social network. A given social network will lead to very different types of limiting 
behavior depending on the context in which the forceful agents are located. We provided 
a tight characterization for graphs with the forceful essential edges, that is, graphs 
representing societies in which a forceful agent links two disconnected clusters. Such 
graphs approximate situations in which forceful agents, such as media outlets or political 
leaders, themselves obtain all of their information from a small group of other individuals. 
The interesting and striking result in this case is that the excess influence of all of the 
members of the small group are the same, even if some of them are not directly linked 
to forceful agents. We then extended these findings to more general societies using the 
notion of information bottlenecks. 

Our third set of results provide new efficient graph clustering algorithms for comput- 
ing tighter bounds on excess influence. 

We view our paper as a first attempt in quantifying misinformation in society. As 
such, we made several simplifying assumptions and emphasized the characterization re- 
sults to apply for general societies. Many areas of future investigation stem from this 
endeavor. First, it is important to consider scenarios in which learning and information 
updating are, at least partly, Bayesian. Our non-Bayesian framework is a natural start- 
ing point, both because it is simpler to analyze and because the notion of misinformation 
is more difficult to introduce in Bayesian models. Nevertheless, game theoretic models 
of communication can be used for analyzing situations in which a sender may explicitly 
try to mislead one or several receivers. Second, one can combine a model of communica- 
tion along the lines of our setup with individuals taking actions with immediate payoff 
consequences and also updating on the basis of their payoffs. Misinformation will then 
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have short-run payoff consequences, but whether it will persist or not will depend on 
how informative payoffs are and on the severity of its short-run payoff consequences. 
Third, it would be useful to characterize what types of social networks are more robust 
to the introduction of misinformation and how agents might use simple rules in order to 
avoid misinformation. 

Finally, our approach implies that the society (social network) will ultimately reach 
a consensus, even though this consensus opinion is a random variable. In practice, there 
are widespread differences in beliefs in almost all societies. There is little systematic 
analysis of such differences in beliefs in the literature at the moment, and this is clearly 
an important and challenging area for future research. Our framework suggests two 
fruitful lines of research. First, although a stochastic consensus is eventually reached in 
our model, convergence can be very slow. Thus characterizing the rate of convergence 
to consensus in this class of models might provide insights about what types of societies 
and which sets of issues should lead to such belief differences. Second, if we relax the 
assumption that even forceful agents necessarily obtain some (albeit limited) information 
from others, thus removing the "no man is an island" feature, then it can be shown that 
the society will generally not reach a consensus. Nevertheless, characterizing differences 
in opinions in this case is difficult and requires a different mathematical approach. We 
plan to investigate this issue in future work. 
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Appendix A 

Preliminary Lemmas, Sections [3] and [4] 

This appendix presents two lemmas that will be used in proving the convergence of 
agent beliefs (i.e., Theorem!!]) and in establishing properties of the social network matrix 
T in Appendix C. 

The first lemma provides conditions under which a nonnegative n x n matrix M is 
primitive, i.e., there exists a positive integer k such that all entries of the A;*^ power of 
M, M^, are positive (see [33]). The lemma also provides a positive uniform lower bound 
on the entries of the matrix as a function of the entries of M and the properties 
of the graph induced by the positive entries of matrix M . A version of this lemma was 
established in [2S]. We omit the proof here since it is not directly relevant to the rest of 
the analysis. 

Lemma 5. Let if be a nonnegative nxn matrix that satisfies the following conditions: 

(a) The diagonal entries of H are positive, i.e.. Ha > for all i. 

(b) Let S denote a set of edges such that the graph (A/", S) is connected. For all 
(z, j) G S, the entry Hij is positive, i.e., £ C {{i,j) \ Hij > 0}. 

Let d denote the maximum shortest path length between any i,j in the induced graph 
{Af, S), and ?7 > be a scalar given by 

ri = min < min Ha, min Hij 

Then, we have 

[H%j>ri'^ foralH,j. 



The second lemma considers a sequence z{k) generated by a linear time-varying 
update rule, i.e., given some z{Q), the sequence {z{k)} is generated by 

z{k) = H{k)z{k - 1) for all A; > 0, 

where H{k) is a stochastic matrix for all k > 0. We introduce the matrices $(/c, s) = 
H{k)H{k - 1) . . . H{s) to relate z{k + 1) to z{s) for s < k, i.e., 

z{k + 1) = ^{k, s)z{s). 

The lemma shows that, under some assumptions on the entries of the matrix $(A;, s), the 
disagreement in the components of z{k), defined as the difference between the maximum 
and minimum components of z{k), decreases with k and provides a bound on the amount 
of decrease. 
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Lemma 6. Let {H{k)} be a sequence of n x n stochastic matrices. Given any z{0) G M", 
let {z{k)} be a sequence generated by the linear update rule 

z{k) = H{k)z{k - 1) for all k>Q. (39) 

Assume that there exists some integer 5 > and scalar ^ > such that 

[<l(s + 5-1, s)]ij > 9 for all j, and s > 0. 

For all > 0, define M{k) E M and m{k) G M as follows: 

Mik) = max Zi(k), mik) = min ^^(/c). (40) 

Then, for all s > 0, we have n6 < 1 and 

M(s + B)- m{s + B) <{1- n9){M{s) - m(s)). 



Proof. In view of the linear update rule (l39l) . we have for all i, 

n 

Zi{s + B) = ^[<i>(s + B-1, s)]ij Zj{s) for all s > 0. 
i=i 

We rewrite the preceding relation as 

n n 

z,{s + B) = J2 + ^['^(^ + B-1, s)],, z,{s), (41) 

i=i i=i 

where [$(s + i? — l,s)]jj = [$(s + i? — l,s)]y — for all Since by assumption 
[$(s + 5 — 1, s)]ij > 6 for all z, j, we have 

[l>(s + 5 - 1, s)]ij > foralH,j. 

Moreover, since the matrices H{k) are stochastic, the product matrix $(s + 5 — 1, s) is 
also stochastic, and therefore we have 

n 

^[^s + B-l,s)]ij = l-ne for alH. 
From the preceding two relations, we obtain 1 — nO > and 

n 

(1 - ne)m{s) < ^[^{s + B-1, s)]ij Zj{k) < (1 - n9)M{s), 
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where m{s) and M{s) are defined in Eq. (l40l) . Combining this relation with Eq. (j4T 
we obtain for all i 

n 

(1 - ne)m{s) <Zi{s + B)-Y^ Ozjis) < (1 - n^)M(s). 
Since this relation holds for all i, we have 



(1 - n9)m{s) <m{s + B)-Y^ Ozj{s), 

n 

from which we obtain 

M{s + B)- m{s + 5) < (1 - ne){M{s) - m(s)) for all s > 0. 



□ 



Appendix B 

Properties of the Mean Interaction and Transition Matrices, 
Sections [3] and [4] 

We establish some properties of the mean interaction matrix W and the transition 
matrices $(/c,s) under the assumptions discussed in Section [2^21 Recall that transition 
matrices are given by 

^k,s) = W{k)W{k-l)---W{s + l)W{s) for all A; and s with A: > s, (42) 

with $(A;, k) = W{k) for all k. Also note that the mean interaction matrix is given by 
W = E[W{k)] for all k. In view of the belief update model the entries of the 

matrix W can be written as follows. For all i G A/", the diagonal entries are given by 



1 — 1 



n 



n 



and for all i 7^ j G TV, the off-diagonal entries are given by 



[W] 



V 



n 



Pij[ — + aij{l-e)j +PjiY 



(43) 



(44) 



Using the assumptions of Section 12.21 Lemma [5l and the explicit expressions for the 
entries of the matrix W, we have the following result. 
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Lemma 7. Let d be the maximum shortest path length between any i,j in the graph 
{N,£) [cf. Eq. ([3])], and be a scalar given by 



Tj = min < min \W]ii, min \W]ij > , (45) 

[cf. Eqs. and (HI])]. 

(a) The scalar rj is positive and we have 

[W^jyT]"^ foralH,j. 

(b) We have 

P|[$(s + d - 1, s)]ij > y} > y for all s > 0, z, and j. 



Proof, (a) We show that under Assumptions [T] and [3l the mean interaction matrix W 
has positive diagonal entries and the set S [cf. Eq. (^] is a subset of the link set induced 
by the positive elements of W. Together with the Connectivity assumption, part (a) 
then follows from Lemma O 

By Assumption [H we have for all i, ^j^iPij = 1 and pij > for all j. This implies 
that Y2j^iPji < n — 1 and therefore 

1 — > for all t. 

n 

Since Ylij^iPij = 1 ^5 there exists some j such that pij > 0, i.e., (z, j) G S. In view 

of the information exchange model, we have jSij > or aij > or > 0, implying that 

Pij + aije + 7ij) > 0. 
Combining the preceding two relations with Eq. (l43l) . we obtain 

[W]u > for all i. (46) 
We next show that for any link in the set S, the entry [W]ij is positive, i.e., 

£c{i^,J) I m,,>0}. 

For any (i, j) G we have pij > 0, and therefore Pij + aij > (cf. Assumption [3]) . This 
implies that 

Pij (y- + 0!ij{l - e)) > 0, 
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which by Eq. (l44l) yields [W]ij > 0. Together with Eq. (l46|l . this shows that the scalar 
r] defined in (145|) is positive. By Assumption [21 the graph {J\f, £) is connected. Using 
the identification H = W in Lemma [H we see that the conditions of this lemma are 
satisfied, establishing part (a), 
(b) For all i,j and s > 0, we have 

P{[$(s + rf-l,s)],,>^} = p|l-[$(s + d-l,.)],,<l-^} 

= l-p{l-[$(5 + d-l,s)],,>l-^}. (47) 

The Markov Inequality states that for any nonnegative random variable Y with a finite 
mean the probability that the outcome of the random variable Y exceeds any 

given scalar 5 > Q satisfies 

P{r>.}<^. 



By applying the Markov inequality to the random variable 1 — [$(s + c? — 1, s)]ij [which 
is nonnegative and has a finite expectation in view of the stochasticity of the matrix 
$(s + d — l,s) for all s > 0], we obtain 



p{i-ms+d-i,s)],,>i-'^]< 



E[i-ms+d-i,s)]i 



1 -r/V2 

Combining with Eq. fHT]) . this yields 



P 



By the definition of the transition matrices [cf. Eq. P2l) ]. we have 

E[^s + d-l,s)]= E[W{s + d- l)W{s + - 2) ■ ■ ■ W{s)] = W'^, 

where the second equality follows from the assumption that W{k) is independent and 
identically distributed over k. By part (a), this implies that 

[Ems + d-l,s)]],j > vi" for alH,j, 

which combined with Eq. fj^Hj) yields 

p([$(. + - 1, .)],, > > 1 - = > 

establishing the desired result. □ 

The next two lemmas establish properties of transition matrices. 
Lemma 8. 
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(a) [^{k, s)]ii > for all k and s with k > s, and alH G A/" with probability one. 

(b) Assume that there exist integers K,B > 1 and a scalar ^ > such that for some 
s > and k E {0, . . . , K}, we have 

[<l>(s + {k + 1)B -l,s + kB)]ij > ^ for some 

Then, 

+ KB - 1, s)]ij > ie^~^ with probability one. 

Proof, (a) We let s be arbitrary and prove the relation by induction on k. By the 
definition of the transition matrices [cf. Eq. (l42l) ]. we have $(s, s) = W{s). Thus, the 
relation > e^-*+i holds for k = s from the definition of the update matrix 

W{k) [cf. Eq. ([5])]. Suppose now that the relation holds for some k > s and consider 
+ We have 



/i=i 



where the first inequality follows from the nonnegativity of the entries of $(/c, s), and 
the second inequality follows from the inductive hypothesis, 
(b) For any s > 0, we have 



ms + KB-l,s)]i, = J2ms + KB-l,s + {k + l)B)Ums + {k + l)B-l,s)] 



hj 



h=l 



> + KB-l,s + ik + l)B)]uMs + ik + 1)B - 1, s)] 



> e(^"'-~^)^[<l>(s + (A; + l)5-l,s)] 



I], 



where the last inequality follows from part (a). Similarly, 

n 

ms + {k + l)B-l,s)],j = + + + + 



h=l 

> + {k + l)B -l,s + kB)]ij[^s + kB-l, s)]jj 



where the second inequality follows from the assumption [$(s + (/i;+l)-B — 1, s+kBy\ij > ^ 
and part (a). Combining the preceding two relations yields the desired result. □ 

Lemma 9. We have 

p|[<l>(s + n2d-l,s)],j > ^e"'-\ foralH,j|> (^^^ for all s > 0, 

where the scalar 77 > and the integer d are the constants defined in Lemma [71 
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Proof. Consider a particular ordering of the elements of an n x n matrix and let kij G 
{0, . . . , — 1} denote the unique index for element From Lemma [8](b), we have 

P[ms + n^d-l,s)],, > for alH,j} 

> P|[$(s + {kij + l)d-l,s + kijd)]ij > y, for all 
= n p{Hs + ihj + l)d-l,s + h,d)l, > |I} 

> ft 



□ 



Here the second equality follows from the independence of the random events 

+ {k + l)d-l,S + kd)]ij > ^} 

over all /c = 0, . . . , — 1, and the last inequality follows from Lemma [Tl^b). 
Appendix C 

Properties of the Social Network Matrix, Section [4] 

The next lemma studies the properties of the social network matrix T. Note that 
the entries of the matrix T can be written as follows: For all i e A^, the diagonal entries 
are given by 



1 — — h— 



n 



n 



1 - lij 



7u) + ^P]i 



1 - Iji 



(49) 



and for all i ^ j & A/", the off-diagonal entries are given by 



1 



n 



Pij -^T^ + Pji 



2 ^^'2 

Lemma 10. Let T be the social network matrix [cf. Eq. ([9])]. Then, we have: 



(50) 



(a) The matrix converges to a stochastic matrix with identical rows as k goes 
to infinity, i.e., 

lim = -ee'. 

k^oo n 

(b) For any z{^) G M", let the sequence z{k) be generated by the linear update rule 

z{k) = Tz{k - 1) for all A; > 0. 
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For all k>0, define M{k) G M and m{k) G M as follows: 



M(/c) = max Zi{k), 



m{k) = min Zi{k). 



Then, for all /c > 0, we have 

M{k) - m{k) < 5\M{0) - m(0)). 
Here 5 > is a constant given by 



1 - lij 1 - Iji 

Pij „ + Pji 



X = nun < — 



2 '''2 

and d is the maximum shortest path length in the graph (A/", S) [cf. Eq. 



Proof, (a) By Assumption [H we have for all i, Ylij^iVij = 1 and pij > for all j. This 
implies that Xl^yi Pj^ < n — 1 and therefore 



for all z. 



n 



(51) 



Since '^j^iVij = 1 for all z, there exists some j such that Pij > 0, i.e., (z,j) G By 
Assumption [31 this implies that Pij + aij = 1 — •jij > 0, showing that Ta > for all i. 
Similarly, for any G S, we have pij > and therefore 1 — 7ij > 0, showing that 

Tij > for all (i, j) G ^. Using Eq. in Eqs. (ggD and ([50]), it follows that for all i 

[T]ii > Tij for all j. 

Thus, we can use Lemma \5\ with the identification 



1 

X = mm < — 



1 - lij 1 - Iji 

Pij o ~'~ o 



(52) 



and obtain 

[T"],, >x' foralH,j, (53) 

i.e., T is a primitive matrix and therefore the Markov Chain with transition probability 
matrix T is regular. It follows from Theorem [3](a) that for any ^(0) G M", we have 



lim T'^z(O) = ez, 

where z is given hj z = ir'z^O) for some probability vector tt. Since T is a stochastic and 
symmetric matrix, it is doubly stochastic. Denoting z{k) = T^2;(0), this implies that 
the average of the entries of the vector z{k) is the same for all k, i.e.. 



-S^ Zi{k) = Zi{0) for all A: >0. 

i=\ i=\ 
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Combining the preceding two relations, we obtain 



1=1 



lim — \^ Zi{k) 




i=l 




where x is defined in Eq. ( l52l) . and obtain 



M{k) - m{k) < (1 - r2x'^)^(M(0) - m(0)). 



□ 



Appendix D 

Characterization of the Mean Commute Time, Section [5] 

First, we characterize the mean commute time between two nodes for a random walk 
on an undirected graph using Dirichlet principle and its dual, Thompson's principle. 

Definition 8. Consider a random walk on a weighted undirected graph (A/", A) with 
weight Wij associated to each edge Define the Dirichlet form as follows. For 

functions g : M write 



Lemma 11. Consider a random walk on a weighted undirected graph with weight Wij 
associated to each edge {i^j}- For mean commute time between distinct nodes a and h 
we have. 



where ruah is the mean first passage time from a to 6, and w is the total edge weight. 



where w = ^^jWij is the total edge weight. 




(54) 



(55) 



Proof. See Section 7.2 of [2]. 



□ 
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It is worth mentioning that the two forms of the mean commute time characterization 
in Lemma [11] are dual of each other. The first form is a corollary of Dirichlet principle, 
while the second is immediate result of Thompson's principle. Using the electric circuit 
analogy, we can think of function g{i) as potential associated to node i, and flow fij as 
the current on edge {i,j} with resistance The expressions in are equivalent 
descriptions of minimum energy dissipation in such electric network. Hence, we can 
interpret the mean commute time between two particular nodes as the effective resistance 
between such nodes in a resistive network. This allows us to use Monotonicity Law to 
obtain simpler bounds for mean commute time. 

Lemma 12. (Monotonicity Law) Let Wij < Wij be the edge-weights for two undi- 
rected graphs. Then, 

rriav + m^a < ( ) (?^a« + m^a), for all a, v, 
where w = JZij'^ij w = J^ij'^ij ^^e total edge weight. 

Proof. Let /* and /* be the optimal solutions of (l55l) for the original and modified 
graphs, respectively. We can write 




where the first inequality follows from optimality of /*, and feasibility of /*. □ 

By the electric network analogy. Lemma [I2] states that increasing resistances in a 
circuit increases the effective resistance between any two nodes in the network. Mono- 
tonicity law can be extremely useful in providing simple bounds for mean commute 
times. 
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